Dinh Phung - Main - Publications

Publications

Note*:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. Please also observe the IEEE Copyright, ACM Copyright and Springer Copyright Notices.

Preprints and Selected Papers

Neural Topic Model via Optimal Transport
He Zhao, Dinh Phung, Viet Huynh, Trung Le and Wray Buntine. In In Proc. of the 9th Int. Conf. on Learning Representations (ICLR), 2021. [ | ]

@INPROCEEDINGS { zhao_etal_iclr2021_neural, AUTHOR = { He Zhao and Dinh Phung and Viet Huynh and Trung Le and Wray Buntine }, BOOKTITLE = { In Proc. of the 9th Int. Conf. on Learning Representations (ICLR) }, TITLE = { Neural Topic Model via Optimal Transport }, YEAR = { 2021 }, ARCHIVEPREFIX = { arXiv }, EPRINT = { 2008.13537 }, PRIMARYCLASS = { cs.IR }, TIMESTAMP = { 2021-01-13 }, }

Parameterized Rate-Distortion Stochastic Encoder
Quan Hoang, Trung Le and Dinh Phung. In Proc. of the 37th International Conference on Machine Learning (ICML), 2020. [ | ]

@INPROCEEDINGS { hoang_etal_icml20_parameterized, AUTHOR = { Quan Hoang and Trung Le and Dinh Phung }, BOOKTITLE = { Proc. of the 37th International Conference on Machine Learning (ICML) }, TITLE = { Parameterized Rate-Distortion Stochastic Encoder }, YEAR = { 2020 }, }

A Relational Memory-based Embedding Model for Triple Classification and Search Personalization
Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. In Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020. [ | | pdf]
Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task.

@INPROCEEDINGS { nguyen_etal_acl9_relational, AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung }, BOOKTITLE = { Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) }, TITLE = { A Relational Memory-based Embedding Model for Triple Classification and Search Personalization }, YEAR = { 2020 }, ABSTRACT = { Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task. }, FILE = { :nguyen_etal_acl9_relational - A Relational Memory Based Embedding Model for Triple Classification and Search Personalization.PDF:PDF }, URL = { https://arxiv.org/abs/1907.06080 }, }

Deep Generative Models of Sparse and Overdispersed Discrete Data
He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung and Mingyuan Zhou. In Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS), 2020. [ | | pdf]
In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models.

@INPROCEEDINGS { zhao_etal_aistats20_deepgenerative, AUTHOR = { He Zhao and Piyush Rai and Lan Du and Wray Buntine and Dinh Phung and Mingyuan Zhou }, TITLE = { Deep Generative Models of Sparse and Overdispersed Discrete Data }, BOOKTITLE = { Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS) }, YEAR = { 2020 }, ABSTRACT = { In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models. }, FILE = { :zhao_etal_aistats20_deepgenerative - Deep Generative Models of Sparse and Overdispersed Discrete Data.pdf:PDF }, URL = { https://www.semanticscholar.org/paper/Deep-Generative-Models-of-Sparse-and-Overdispersed-Zhao-Rai/8136c46488875b09e15e89c08bf02698901322a1 }, }

Learning Generative Adversarial Networks from Multiple Data Sources
Trung Le, Quan Hoang, Hung Vu, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pages 2823-2829, July 2019. [ | | pdf]
Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair.

@INPROCEEDINGS { le_etal_ijcai19_learningGAN, AUTHOR = { Trung Le and Quan Hoang and Hung Vu and Tu Dinh Nguyen and Hung Bui and Dinh Phung }, TITLE = { Learning Generative Adversarial Networks from Multiple Data Sources }, BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) }, YEAR = { 2019 }, PAGES = { 2823--2829 }, MONTH = { July }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, ABSTRACT = { Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair. }, FILE = { :le_etal_ijcai19_learningGAN - Learning Generative Adversarial Networks from Multiple Data Sources.pdf:PDF }, URL = { https://www.ijcai.org/Proceedings/2019/391 }, }

Three-Player Wasserstein GAN via Amortised Duality
Nhan Dam, Quan Hoang, Trung Le, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI), pages 2202-2208, July 2019. [ | | pdf]
We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method.

@INPROCEEDINGS { dam_etal_ijcai19_3pwgan, AUTHOR = { Nhan Dam and Quan Hoang and Trung Le and Tu Dinh Nguyen and Hung Bui and Dinh Phung }, BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI) }, TITLE = { Three-Player {W}asserstein {GAN} via Amortised Duality }, YEAR = { 2019 }, MONTH = { July }, PAGES = { 2202--2208 }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, ABSTRACT = { We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method. }, FILE = { :dam_etal_ijcai19_3pwgan - Three Player Wasserstein GAN Via Amortised Duality.pdf:PDF }, URL = { https://www.ijcai.org/Proceedings/2019/305 }, }

Learning How to Active Learn by Dreaming
Thuy-Trang Vu, Ming Liu, Dinh Phung and Gholamreza Haffari. In In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, jul 2019. [ | ]

@INPROCEEDINGS { vu_etal_acl19_learning, AUTHOR = { Thuy-Trang Vu and Ming Liu and Dinh Phung and Gholamreza Haffari }, TITLE = { Learning How to Active Learn by Dreaming }, BOOKTITLE = { In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL) }, YEAR = { 2019 }, ADDRESS = { Florence, Italy }, MONTH = { jul }, }

A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization
Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, USA, jun 2019. [ | | pdf]
In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset.

@INPROCEEDINGS { nguyen_etal_naaclhtl19_acapsule, AUTHOR = { Dai Quoc Nguyen and Thanh Vu and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung }, TITLE = { A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization }, BOOKTITLE = { In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL) }, YEAR = { 2019 }, ADDRESS = { Minneapolis, USA }, MONTH = { jun }, ABSTRACT = { In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. }, FILE = { :nguyen_etal_naaclhtl19_acapsule - A Capsule Network Based Embedding Model for Knowledge Graph Completion and Search Personalization.pdf:PDF }, URL = { https://arxiv.org/abs/1808.04122 }, }

Probabilistic Multilevel Clustering via Composite Transportation Distance
Viet Huynh, Nhat Ho, Dinh Phung and Michael I. Jordan. In In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS), Okinawa, Japan, apr 2019. [ | | pdf]
We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach.

@INPROCEEDINGS { ho_etal_aistats19_probabilistic, AUTHOR = { Viet Huynh and Nhat Ho and Dinh Phung and Michael I. Jordan }, TITLE = { Probabilistic Multilevel Clustering via Composite Transportation Distance }, BOOKTITLE = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) }, YEAR = { 2019 }, ADDRESS = { Okinawa, Japan }, MONTH = { apr }, ABSTRACT = { We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach. }, FILE = { :ho_etal_aistats19_probabilistic - Probabilistic Multilevel Clustering Via Composite Transportation Distance.pdf:PDF }, JOURNAL = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) }, URL = { https://arxiv.org/abs/1810.11911 }, }

Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection
Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel and Lizhen Qu. In International Conference on Learning Representations (ICLR), 2019. [ | | pdf]

@INPROCEEDINGS { le_etal_iclr18_maximal, AUTHOR = { Tue Le and Tuan Nguyen and Trung Le and Dinh Phung and Paul Montague and Olivier De Vel and Lizhen Qu }, TITLE = { Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection }, BOOKTITLE = { International Conference on Learning Representations (ICLR) }, YEAR = { 2019 }, FILE = { :le_etal_iclr18_maximal - Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection.pdf:PDF }, URL = { https://openreview.net/forum?id=ByloIiCqYQ }, }

Robust Anomaly Detection in Videos using Multilevel Representations
Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, USA, 2019. [ | | pdf]

@INPROCEEDINGS { vu_etal_aaai19_robustanomaly, AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung }, TITLE = { Robust Anomaly Detection in Videos using Multilevel Representations }, BOOKTITLE = { In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI) }, YEAR = { 2019 }, ADDRESS = { Honolulu, USA }, FILE = { :vu_etal_aaai19_robustanomaly - Robust Anomaly Detection in Videos Using Multilevel Representations.pdf:PDF }, GROUPS = { Anomaly Detection }, URL = { https://github.com/SeaOtter/vad_gan }, }

Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data
Khanh Nguyen, Trung Le, Tu Nguyen, Geoff Webb and Dinh Phung. In Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), London, UK, aug 2018. [ | ]
Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to eﬀectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals.

@INPROCEEDINGS { nguyen_etal_kdd18_robustbayesian, AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Geoff Webb and Dinh Phung }, TITLE = { Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data }, BOOKTITLE = { Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD) }, YEAR = { 2018 }, ADDRESS = { London, UK }, MONTH = { aug }, PUBLISHER = { ACM }, ABSTRACT = { Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to eﬀectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals. }, FILE = { :nguyen_etal_kdd18_robustbayesian - Robust Bayesian Kernel Machine Via Stein Variational Gradient Descent for Big Data.pdf:PDF }, }

MGAN: Training Generative Adversarial Nets with Multiple Generators
Quan Hoang, Tu Dinh Nguyen, Trung Le and Dinh Phung. In International Conference on Learning Representations (ICLR), 2018. [ | | pdf]
We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators.

@INPROCEEDINGS { hoang_etal_iclr18_mgan, AUTHOR = { Quan Hoang and Tu Dinh Nguyen and Trung Le and Dinh Phung }, TITLE = { {MGAN}: Training Generative Adversarial Nets with Multiple Generators }, BOOKTITLE = { International Conference on Learning Representations (ICLR) }, YEAR = { 2018 }, ABSTRACT = { We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators. }, FILE = { :hoang_etal_iclr18_mgan - MGAN_ Training Generative Adversarial Nets with Multiple Generators.pdf:PDF }, URL = { https://openreview.net/forum?id=rkmu5b0a- }, }

Geometric enclosing networks
Trung Le, Hung Vu, Tu Dinh Nguyen and Dinh Phung. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18}, pages 2355-2361, July 2018. [ | ]
Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data.

@INPROCEEDINGS { le_etal_ijcai18_geometric, AUTHOR = { Trung Le and Hung Vu and Tu Dinh Nguyen and Dinh Phung }, TITLE = { Geometric enclosing networks }, BOOKTITLE = { Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18} }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, PAGES = { 2355--2361 }, YEAR = { 2018 }, MONTH = { July }, ABSTRACT = { Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data. }, FILE = { :le_etal_ijcai18_geometric - Geometric Enclosing Networks.pdf:PDF }, }

A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2018. [ | | pdf]
We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237.

@INPROCEEDINGS { nguyen_etal_naacl18_anovelembedding, AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung }, TITLE = { A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network }, BOOKTITLE = { Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) }, YEAR = { 2018 }, ABSTRACT = { We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237. }, FILE = { :nguyen_etal_naacl18_anovelembedding - A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network.pdf:PDF }, URL = { https://arxiv.org/abs/1712.02121 }, }

Learning Graph Representation via Frequent Subgraphs
Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In Proc. of SIAM Int. Conf. on Data Mining (SDM), 2018. (Student travel award). [ | ]

@INPROCEEDINGS { nguyen_etal_sdm18_learning, AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung }, TITLE = { Learning Graph Representation via Frequent Subgraphs }, BOOKTITLE = { Proc. of SIAM Int. Conf. on Data Mining (SDM) }, YEAR = { 2018 }, PUBLISHER = { SIAM }, NOTE = { Student travel award }, FILE = { :nguyen_etal_sdm18_learning - Learning Graph Representation Via Frequent Subgraphs.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2018.01.12 }, }

Model-Based Learning for Point Pattern Data
Ba-Ngu Vo, Nhan Dam, Dinh Phung, Quang N. Tran and Ba-Tuong Vo. Pattern Recognition (PR), 84:136-151, 2018. [ | | pdf]
This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.

@ARTICLE { vo_etal_pr18_modelbased, AUTHOR = { Ba-Ngu Vo and Nhan Dam and Dinh Phung and Quang N. Tran and Ba-Tuong Vo }, JOURNAL = { Pattern Recognition (PR) }, TITLE = { Model-Based Learning for Point Pattern Data }, YEAR = { 2018 }, ISSN = { 0031-3203 }, PAGES = { 136--151 }, VOLUME = { 84 }, ABSTRACT = { This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. }, DOI = { https://doi.org/10.1016/j.patcog.2018.07.008 }, FILE = { :vo_etal_pr18_modelbased - Model Based Learning for Point Pattern Data.pdf:PDF }, KEYWORDS = { Point pattern, Point process, Random finite set, Multiple instance learning, Classification, Novelty detection, Clustering }, PUBLISHER = { Elsevier }, URL = { http://www.sciencedirect.com/science/article/pii/S0031320318302395 }, }

Dual Discriminator Generative Adversarial Nets
Tu Dinh Nguyen, Trung Le, Hung Vu and Dinh Phung. In Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS), pages 2667-2677, USA, 2017. [ | | pdf]
We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database.

@INPROCEEDINGS { tu_etal_nips17_d2gan, AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Vu and Dinh Phung }, TITLE = { Dual Discriminator Generative Adversarial Nets }, BOOKTITLE = { Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS) }, YEAR = { 2017 }, SERIES = { NIPS'17 }, PAGES = { 2667--2677 }, ADDRESS = { USA }, PUBLISHER = { Curran Associates Inc. }, ABSTRACT = { We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database. }, ACMID = { 3295027 }, FILE = { :tu_etal_nips17_d2gan - Dual Discriminator Generative Adversarial Nets.pdf:PDF }, ISBN = { 978-1-5108-6096-4 }, LOCATION = { Long Beach, California, USA }, NUMPAGES = { 11 }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.09.06 }, URL = { http://dl.acm.org/citation.cfm?id=3294996.3295027 }, }

GoGP: Fast Online Regression with Gaussian Processes
Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. In International Conference on Data Mining (ICDM), 2017. [ | ]
One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.

@INPROCEEDINGS { le_etal_icdm17_gogp, AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung }, TITLE = { {GoGP}: Fast Online Regression with Gaussian Processes }, BOOKTITLE = { International Conference on Data Mining (ICDM) }, YEAR = { 2017 }, ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. }, FILE = { :le_etal_icdm17_gogp - GoGP_ Fast Online Regression with Gaussian Processes.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.09.01 }, }

Supervised Restricted Boltzmann Machines
Tu Dinh Nguyen, Dinh Phung, Viet Huynh and Trung Le. In In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI), 2017. [ | | pdf]
We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation.

@INPROCEEDINGS { nguyen_etal_uai17supervised, AUTHOR = { Tu Dinh Nguyen and Dinh Phung and Viet Huynh and Trung Le }, TITLE = { Supervised Restricted Boltzmann Machines }, BOOKTITLE = { In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2017 }, ABSTRACT = { We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation. }, FILE = { :nguyen_etal_uai17supervised - Supervised Restricted Boltzmann Machines.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.08.29 }, URL = { http://auai.org/uai2017/proceedings/papers/106.pdf }, }

Multilevel clustering via Wasserstein means
Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, Viet Huynh and Dinh Phung. In Proc. of the 34th Internaltional Conference on Machine Learning (ICML), pages 1501-1509, 2017. [ | | pdf]
We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach.

@INPROCEEDINGS { ho_etal_icml17multilevel, AUTHOR = { Nhat Ho and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and Viet Huynh and Dinh Phung }, TITLE = { Multilevel clustering via {W}asserstein means }, BOOKTITLE = { Proc. of the 34th Internaltional Conference on Machine Learning (ICML) }, YEAR = { 2017 }, VOLUME = { 70 }, SERIES = { ICML'17 }, PAGES = { 1501--1509 }, PUBLISHER = { JMLR.org }, ABSTRACT = { We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach. }, ACMID = { 3305536 }, FILE = { :ho_etal_icml17multilevel - Multilevel Clustering Via Wasserstein Means.pdf:PDF }, LOCATION = { Sydney, NSW, Australia }, NUMPAGES = { 9 }, URL = { http://dl.acm.org/citation.cfm?id=3305381.3305536 }, }

Approximation Vector Machines for Large-scale Online Learning
Trung Le, Tu Dinh Nguyen, Vu Nguyen and Dinh Q. Phung. Journal of Machine Learning Research (JMLR), 2017. [ | | pdf]
One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.

@ARTICLE { le_etal_jmlr17approximation, AUTHOR = { Trung Le and Tu Dinh Nguyen and Vu Nguyen and Dinh Q. Phung }, TITLE = { Approximation Vector Machines for Large-scale Online Learning }, JOURNAL = { Journal of Machine Learning Research (JMLR) }, YEAR = { 2017 }, ABSTRACT = { One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size. }, FILE = { :le_etal_jmlr17approximation - Approximation Vector Machines for Large Scale Online Learning.pdf:PDF }, KEYWORDS = { kernel, online learning, large-scale machine learning, sparsity, big data, core set, stochastic gradient descent, convergence analysis }, URL = { https://arxiv.org/abs/1604.06518 }, }

Discriminative Bayesian Nonparametric Clustering
Vu Nguyen, Dinh Phung, Trung Le, Svetha Venkatesh and Hung Bui. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models.

@INPROCEEDINGS { nguyen_etal_ijcai17discriminative, AUTHOR = { Vu Nguyen and Dinh Phung and Trung Le and Svetha Venkatesh and Hung Bui }, TITLE = { Discriminative Bayesian Nonparametric Clustering }, BOOKTITLE = { Proc. of International Joint Conference on Artificial Intelligence (IJCAI) }, YEAR = { 2017 }, ABSTRACT = { We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models. }, FILE = { :nguyen_etal_ijcai17discriminative - Discriminative Bayesian Nonparametric Clustering.pdf:PDF }, URL = { https://www.ijcai.org/proceedings/2017/355 }, }

Large-scale Online Kernel Learning with Random Feature Reparameterization
Tu Dinh Nguyen, Trung Le, Hung Bui and Dinh Phung. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency.

@INPROCEEDINGS { tu_etal_ijcai17_rrf, AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Bui and Dinh Phung }, BOOKTITLE = { Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) }, TITLE = { Large-scale Online Kernel Learning with Random Feature Reparameterization }, YEAR = { 2017 }, SERIES = { IJCAI'17 }, ABSTRACT = { A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency. }, FILE = { :tu_etal_ijcai17_rrf - Large Scale Online Kernel Learning with Random Feature Reparameterization.pdf:PDF }, LOCATION = { Melbourne, Australia }, NUMPAGES = { 7 }, URL = { https://www.ijcai.org/proceedings/2017/354 }, }

Hierarchical semi-Markov conditional random fields for deep recursive sequential data
Truyen Tran, Dinh Phung, Hung H. Bui and Svetha Venkatesh. Artificial Intelligence (AIJ), Feb. 2017. [ | | pdf]
We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.

@ARTICLE { tran_etal_aij17hierarchical, AUTHOR = { Truyen Tran and Dinh Phung and Hung H. Bui and Svetha Venkatesh }, TITLE = { Hierarchical semi-Markov conditional random fields for deep recursive sequential data }, JOURNAL = { Artificial Intelligence (AIJ) }, YEAR = { 2017 }, MONTH = { Feb. }, ABSTRACT = { We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. }, FILE = { :tran_etal_aij17hierarchical - Hierarchical Semi Markov Conditional Random Fields for Deep Recursive Sequential Data.pdf:PDF }, KEYWORDS = { Deep nested sequential processes, Hierarchical semi-Markov conditional random field, Partial labelling, Constrained inference, Numerical scaling }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.02.21 }, URL = { http://www.sciencedirect.com/science/article/pii/S0004370217300231 }, }

See my thesis (chapter 5) for for an equivalent directed graphical model, which is the precusor of this work and where I had described the Assymetric Inside-Outside (AIO) algorithm in great detail. A brief version of this for directed case has also appeared in this AAAI'04's paper. The idea of semi-Markov duration modelling has also been addressed for directed case in these CVPR05 and AIJ09 papers.

Column Networks for Collective Classification
Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In The Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017. [ | | pdf]
Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals.

@CONFERENCE { pham_etal_aaai17column, AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Column Networks for Collective Classification }, BOOKTITLE = { The Thirty-First AAAI Conference on Artificial Intelligence (AAAI) }, YEAR = { 2017 }, ABSTRACT = { Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals. }, COMMENT = { Accepted }, FILE = { :pham_etal_aaai17column - Column Networks for Collective Classification.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.11.14 }, URL = { https://arxiv.org/abs/1609.04508 }, }

Dual Space Gradient Descent for Online Learning
Le, Trung, Nguyen, Tu Dinh, Nguyen, Vu and Phung, Dinh. In Advances in Neural Information Processing (NIPS), December 2016. [ | | pdf]
One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines.

@CONFERENCE { le_etal_nips16dual, AUTHOR = { Le, Trung and Nguyen, Tu Dinh and Nguyen, Vu and Phung, Dinh }, TITLE = { Dual Space Gradient Descent for Online Learning }, BOOKTITLE = { Advances in Neural Information Processing (NIPS) }, YEAR = { 2016 }, MONTH = { December }, ABSTRACT = { One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines. }, FILE = { :le_etal_nips16dual - Dual Space Gradient Descent for Online Learning.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.08.16 }, URL = { https://papers.nips.cc/paper/6560-dual-space-gradient-descent-for-online-learning.pdf }, }

Scalable Nonparametric Bayesian Multilevel Clustering
Viet Huynh, Dinh Phung, Svetha Venkatesh, Xuan-Long Nguyen, Matt Hoffman and Hung Bui. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), pages 289-298, June 2016. [ | | pdf]

@CONFERENCE { huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable, AUTHOR = { Viet Huynh and Dinh Phung and Svetha Venkatesh and Xuan-Long Nguyen and Matt Hoffman and Hung Bui }, TITLE = { Scalable Nonparametric {B}ayesian Multilevel Clustering }, BOOKTITLE = { Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2016 }, MONTH = { June }, PUBLISHER = { AUAI Pres }, PAGES = { 289--298 }, FILE = { :huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable - Scalable Nonparametric Bayesian Multilevel Clustering.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.09 }, URL = { http://auai.org/uai2016/proceedings/papers/262.pdf }, }

Budgeted Semi-supervised Support Vector Machine
Le, Trung, Duong, Phuong, Dinh, Mi, Nguyen, Tu, Nguyen, Vu and Phung, Dinh. In 32nd Conference on Uncertainty in Artificial Intelligence (UAI), June 2016. [ | | pdf]

@CONFERENCE { le_duong_dinh_nguyen_nguyen_phung_uai16budgeted, AUTHOR = { Le, Trung and Duong, Phuong and Dinh, Mi and Nguyen, Tu and Nguyen, Vu and Phung, Dinh }, TITLE = { Budgeted Semi-supervised {S}upport {V}ector {M}achine }, BOOKTITLE = { 32nd Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2016 }, MONTH = { June }, FILE = { :le_duong_dinh_nguyen_nguyen_phung_uai16budgeted - Budgeted Semi Supervised Support Vector Machine.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.09 }, URL = { http://auai.org/uai2016/proceedings/papers/110.pdf }, }

Nonparametric Budgeted Stochastic Gradient Descent
Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS), May 2016. [ | | pdf]

@CONFERENCE { le_nguyen_phung_aistats16nonparametric, AUTHOR = { Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh }, TITLE = { Nonparametric Budgeted Stochastic Gradient Descent }, BOOKTITLE = { 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS) }, YEAR = { 2016 }, MONTH = { May }, FILE = { :le_nguyen_phung_aistats16nonparametric - Nonparametric Budgeted Stochastic Gradient Descent.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://www.jmlr.org/proceedings/papers/v51/le16.pdf }, }

One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems
Nguyen, Vu, Nguyen, Tu Dinh, Le, Trung, Phung, Dinh and Venkatesh, Svetha. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1113-1118, Dec 2016. [ | | pdf | code]
Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters.

@CONFERENCE { nguyen_etal_icdm16onepass, AUTHOR = { Nguyen, Vu and Nguyen, Tu Dinh and Le, Trung and Phung, Dinh and Venkatesh, Svetha }, TITLE = { One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems }, BOOKTITLE = { 2016 IEEE 16th International Conference on Data Mining (ICDM) }, YEAR = { 2016 }, PAGES = { 1113-1118 }, MONTH = { Dec }, ABSTRACT = { Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters. }, CODE = { https://github.com/ntienvu/ICDM2016_OLR }, DOI = { 10.1109/ICDM.2016.0145 }, FILE = { :nguyen_etal_icdm16onepass - One Pass Logistic Regression for Label Drift and Large Scale Classification on Distributed Systems.pdf:PDF }, KEYWORDS = { Big Data;distributed processing;pattern classification;regression analysis;Big Data cross-validation;Spark-OLR;class labels;data shuffling;distributed systems;execution time;label-drift classification problem;large-scale classification;large-scale datasets;one-pass logistic regression;optimal hyperparameter selection;sparkling OLR;Bayes methods;Big data;Context;Data models;Estimation;Industries;Logistics;Apache Spark;Logistic regression;distributed system;label-drift;large-scale classification }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.09.10 }, URL = { http://ieeexplore.ieee.org/document/7837958/ }, }

A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process
Nguyen, T., Nguyen, V., Salim, F.D., Le, D.V. and Phung, D.. Pervasive and Mobile Computing (PMC), 2016. [ | | pdf]
Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated.

@ARTICLE { nguyen_nguyen_flora_le_phung_pmc16simultaneous, AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Le, D.V. and Phung, D. }, TITLE = { A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested {D}irichlet Process }, JOURNAL = { Pervasive and Mobile Computing (PMC) }, YEAR = { 2016 }, ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated. }, DOI = { http://dx.doi.org/10.1016/j.pmcj.2016.08.019 }, FILE = { :nguyen_nguyen_flora_le_phung_pmc16simultaneous - A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.08.17 }, URL = { http://www.sciencedirect.com/science/article/pii/S1574119216302097 }, }

Streaming Variational Inference for Dirichlet Process Mixtures
Huynh, V., Phung, D. and Venkatesh, S.. In 7th Asian Conference on Machine Learning (ACML), pages 237-252, Nov. 2015. [ | | pdf]
Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data.

@INPROCEEDINGS { huynh_phung_venkatesh_15streaming, AUTHOR = { Huynh, V. and Phung, D. and Venkatesh, S. }, TITLE = { Streaming Variational Inference for {D}irichlet {P}rocess {M}ixtures }, BOOKTITLE = { 7th Asian Conference on Machine Learning (ACML) }, YEAR = { 2015 }, PAGES = { 237--252 }, MONTH = { Nov. }, ABSTRACT = { Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data. }, FILE = { :huynh_phung_venkatesh_15streaming - Streaming Variational Inference for Dirichlet Process Mixtures.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://www.jmlr.org/proceedings/papers/v45/Huynh15.pdf }, }

Tensor-variate Restricted Boltzmann Machines
Nguyen, Tu, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proc. of AAAI Conf. on Artificial Intelligence (AAAI), pages 2887-2893, Austin Texas, USA , January 2015. [ | | pdf]
Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance.

@INPROCEEDINGS { tu_truyen_phung_venkatesh_aaai15, TITLE = { Tensor-variate Restricted {B}oltzmann Machines }, AUTHOR = { Nguyen, Tu and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of AAAI Conf. on Artificial Intelligence (AAAI) }, YEAR = { 2015 }, ADDRESS = { Austin Texas, USA }, MONTH = { January }, PAGES = { 2887--2893 }, ABSTRACT = { Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance. }, KEYWORDS = { tensor; rbm; restricted boltzmann machine; tvrbm; multiplicative interaction; eeg; }, OWNER = { ngtu }, TIMESTAMP = { 2015.01.29 }, URL = { http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9371 }, }

Learning Latent Activities from Social Signals with Hierarchical Dirichlet Process
Phung, Dinh, Nguyen, T. C., Gupta, S. and Venkatesh, Svetha. In Handbook on Plan, Activity, and Intent Recognition, pages 149-174.Elsevier, , 2014. [ | | pdf | code]
Understanding human activities is an important research topic, noticeablyin assisted living and health monitoring. Beyond simple forms ofactivity (e.g., RFID event of entering a building), learning latentactivities that are more semantically interpretable, such as sittingat a desk, meeting with people or gathering with friends, remainsa challenging problem. Supervised learning has been the typical modelingchoice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adaptto the continuing growth of data over time. In this chapter, we exploreBayesian nonparametric method, in particular the Hierarchical DirichletProcess, to infer latent activities from sensor data acquired ina pervasive setting. Our framework is unsupervised, requires no labeleddata and is able to discover new activities as data grows. We presentexperiments on extracting movement and interaction activities fromsociometric badge signals and show how to use them for detectionof sub-communities. Using the popular Reality Mining dataset, wefurther demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups.

@INCOLLECTION { phung_nguyen_gupta_venkatesh_pair14, TITLE = { Learning Latent Activities from Social Signals with Hierarchical {D}irichlet Process }, AUTHOR = { Phung, Dinh and Nguyen, T. C. and Gupta, S. and Venkatesh, Svetha }, BOOKTITLE = { Handbook on Plan, Activity, and Intent Recognition }, PUBLISHER = { Elsevier }, YEAR = { 2014 }, EDITOR = { Gita Sukthankar and Christopher Geib and David V. Pynadath and HungBui and Robert P. Goldman }, PAGES = { 149--174 }, ABSTRACT = { Understanding human activities is an important research topic, noticeablyin assisted living and health monitoring. Beyond simple forms ofactivity (e.g., RFID event of entering a building), learning latentactivities that are more semantically interpretable, such as sittingat a desk, meeting with people or gathering with friends, remainsa challenging problem. Supervised learning has been the typical modelingchoice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adaptto the continuing growth of data over time. In this chapter, we exploreBayesian nonparametric method, in particular the Hierarchical DirichletProcess, to infer latent activities from sensor data acquired ina pervasive setting. Our framework is unsupervised, requires no labeleddata and is able to discover new activities as data grows. We presentexperiments on extracting movement and interaction activities fromsociometric badge signals and show how to use them for detectionof sub-communities. Using the popular Reality Mining dataset, wefurther demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups. }, CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code }, OWNER = { ctng }, TIMESTAMP = { 2013.07.25 }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/Phung_etal_pair14.pdf }, }

A Random Finite Set Model for Data Clustering
Phung, Dinh and Vo, Ba-Ngu. In Proc. of Intl. Conf. on Fusion (FUSION), Salamanca, Spain, July 2014. [ | | pdf]
Abstract--- The goal of data clustering is to partition data pointsinto groups to minimize a given objective function. While most existingclustering algorithms treat each data point as vector, in many applicationseach datum is not a vector but a point pattern or a set of points.Moreover, many existing clustering methods require the user to specifythe number of clusters, which is not available in advance. This paperproposes a new class of models for data clustering that addressesset-valued data as well as unknown number of clusters, using a DirichletProcess mixture of Poisson random finite sets. We also develop anefficient Markov Chain Monte Carlo posterior inference techniquethat can learn the number of clusters and mixture parameters automaticallyfrom the data. Numerical studies are presented to demonstrate thesalient features of this new model, in particular its capacity todiscover extremely unbalanced clusters in data.

@INPROCEEDINGS { phung_vo_fusion14, TITLE = { A Random Finite Set Model for Data Clustering }, AUTHOR = { Phung, Dinh and Vo, Ba-Ngu }, BOOKTITLE = { Proc. of Intl. Conf. on Fusion (FUSION) }, YEAR = { 2014 }, ADDRESS = { Salamanca, Spain }, MONTH = { July }, ABSTRACT = { Abstract--- The goal of data clustering is to partition data pointsinto groups to minimize a given objective function. While most existingclustering algorithms treat each data point as vector, in many applicationseach datum is not a vector but a point pattern or a set of points.Moreover, many existing clustering methods require the user to specifythe number of clusters, which is not available in advance. This paperproposes a new class of models for data clustering that addressesset-valued data as well as unknown number of clusters, using a DirichletProcess mixture of Poisson random finite sets. We also develop anefficient Markov Chain Monte Carlo posterior inference techniquethat can learn the number of clusters and mixture parameters automaticallyfrom the data. Numerical studies are presented to demonstrate thesalient features of this new model, in particular its capacity todiscover extremely unbalanced clusters in data. }, OWNER = { dinh }, TIMESTAMP = { 2014.05.16 }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/phung_vo_fusion14.pdf }, }

Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter
Vo, Ba-Ngu, Vo, Ba-Tuong and Phung, Dinh. IEEE Transactions on Signal Processing (TSP), 62(24):6554-6567, 2014. [ | ]

@ARTICLE { vo_vo_phung_tsp14, AUTHOR = { Vo, Ba-Ngu and Vo, Ba-Tuong and Phung, Dinh }, TITLE = { Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter }, JOURNAL = { IEEE Transactions on Signal Processing (TSP) }, YEAR = { 2014 }, VOLUME = { 62 }, NUMBER = { 24 }, PAGES = { 6554--6567 }, FILE = { :vo_vo_phung_tsp14 - Labeled Random Finite Sets and the Bayes Multi Target Tracking Filter.pdf:PDF }, OWNER = { dinh }, TIMESTAMP = { 2014.07.02 }, }

Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
Vu Nguyen, Phung, Dinh, XuanLong Nguyen, Venkatesh, Svetha and Hung Bui. In Proc. of Intl. Conf. on Machine Learning (ICML), pages 288-296, 2014. [ | ]

@INPROCEEDINGS { nguyen_phung_nguyen_venkatesh_bui_icml14, TITLE = { Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts }, AUTHOR = { Vu Nguyen and Phung, Dinh and XuanLong Nguyen and Venkatesh, Svetha and Hung Bui }, BOOKTITLE = { Proc. of Intl. Conf. on Machine Learning (ICML) }, YEAR = { 2014 }, PAGES = { 288--296 }, OWNER = { tvnguye }, TIMESTAMP = { 2013.12.13 }, }

Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions
Sunil Kumar Gupta, Santu Rana, Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 235-243, 2014. [ | | pdf]

@INPROCEEDINGS { gupta_rana_phung_venkatesh_sdm14, TITLE = { Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions }, AUTHOR = { Sunil Kumar Gupta and Santu Rana and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) }, YEAR = { 2014 }, PAGES = { 235-243 }, CHAPTER = { 27 }, DOI = { 10.1137/1.9781611973440.27 }, EPRINT = { http://epubs.siam.org/doi/pdf/10.1137/1.9781611973440.27 }, OWNER = { thinng }, TIMESTAMP = { 2015.01.28 }, URL = { http://epubs.siam.org/doi/abs/10.1137/1.9781611973440.27 }, }

An Integrated Framework for Suicide Risk Prediction
Tran, Truyen, Phung, Dinh, Luo, Wei, Harvey,R., Berk,M. and Venkatesh, Svetha. In Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Chicago, US, 2013. [ | ]

@INPROCEEDINGS { tran_phung_luo_harvey_berk_venkatesh_kdd13, TITLE = { An Integrated Framework for Suicide Risk Prediction }, AUTHOR = { Tran, Truyen and Phung, Dinh and Luo, Wei and Harvey,R. and Berk,M. and Venkatesh, Svetha }, BOOKTITLE = { Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD) }, YEAR = { 2013 }, ADDRESS = { Chicago, US }, OWNER = { Dinh }, TIMESTAMP = { 2013.06.07 }, }

Thurstonian Boltzmann Machines: Learning from Multiple Inequalities
Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ]
We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis.

@INPROCEEDINGS { tran_phung_venkatesh_icml13, TITLE = { {T}hurstonian {B}oltzmann Machines: Learning from Multiple Inequalities }, AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { International Conference on Machine Learning (ICML) }, YEAR = { 2013 }, ADDRESS = { Atlanta, USA }, MONTH = { June 16-21 }, ABSTRACT = { We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis. }, OWNER = { dinh }, TIMESTAMP = { 2013.03.01 }, }

Factorial Multi-Task Learning : A Bayesian Nonparametric Approach
Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In Proceedings of International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ]
Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods.

@INPROCEEDINGS { gupta_phung_venkatesh_icml13, TITLE = { Factorial Multi-Task Learning : A Bayesian Nonparametric Approach }, AUTHOR = { Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proceedings of International Conference on Machine Learning (ICML) }, YEAR = { 2013 }, ADDRESS = { Atlanta, USA }, MONTH = { June 16-21 }, ABSTRACT = { Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods. }, OWNER = { Dinh }, TIMESTAMP = { 2013.04.16 }, }

Sparse Subspace Clustering via Group Sparse Coding
Saha, B., Pham, D.S., Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 130-138, 2013. [ | ]
Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably.

@INPROCEEDINGS { saha_pham_phung_venkatesh_sdm13, TITLE = { Sparse Subspace Clustering via Group Sparse Coding }, AUTHOR = { Saha, B. and Pham, D.S. and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) }, YEAR = { 2013 }, PAGES = { 130-138 }, ABSTRACT = { Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably. }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

Bayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster)
Phung, Dinh. In International Conference on Bayesian Nonparametrics, Amsterdam, The Netherlands, June 10-14 2013. [ | | code | poster]
When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's prole and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction.

@INPROCEEDINGS { phung_bnp13, TITLE = { {B}ayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster) }, AUTHOR = { Phung, Dinh }, BOOKTITLE = { International Conference on Bayesian Nonparametrics }, YEAR = { 2013 }, ADDRESS = { Amsterdam, The Netherlands }, MONTH = { June 10-14 }, ABSTRACT = { When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's prole and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction. }, CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code }, OWNER = { dinh }, POSTER = { http://prada-research.net/~dinh/uploads/Main/Publications/A0_poster_BNP13.pdf }, TIMESTAMP = { 2013.03.01 }, }

Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis
Phung, Dinh, Gupta, S. K., Nguyen, T. and Venkatesh, Svetha. IEEE Transactions on Multimedia (TMM), 15:1316-1325, 2013. [ | | pdf]
Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being.

@ARTICLE { phung_gupta_nguyen_venkatesh_tmm13, TITLE = { Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis }, AUTHOR = { Phung, Dinh and Gupta, S. K. and Nguyen, T. and Venkatesh, Svetha }, JOURNAL = { IEEE Transactions on Multimedia (TMM) }, YEAR = { 2013 }, PAGES = { 1316-1325 }, VOLUME = { 15 }, ABSTRACT = { Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being. }, ISSN = { 0219-1377 }, LANGUAGE = { English }, TIMESTAMP = { 2013.04.16 }, URL = { http://prada-research.net/~dinh/uploads/Main/HomePage/phung_gupta_nguyen_venkatesh_tmm13.pdf }, }

Regularized nonnegative shared subspace learning
Gupta, Sunil Kumar, Phung, Dinh, Adams, Brett and Venkatesh, Svetha. Data Mining and Knowledge Discovery, 26(1):57-97, 2013. [ | ]

@ARTICLE { gupta_phung_adams_venkatesh_dami13, TITLE = { Regularized nonnegative shared subspace learning }, AUTHOR = { Gupta, Sunil Kumar and Phung, Dinh and Adams, Brett and Venkatesh, Svetha }, JOURNAL = { Data Mining and Knowledge Discovery }, YEAR = { 2013 }, NUMBER = { 1 }, PAGES = { 57--97 }, VOLUME = { 26 }, OWNER = { thinng }, PUBLISHER = { Springer }, TIMESTAMP = { 2015.01.29 }, }

A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning
Gupta, S., Phung, Dinh and Venkatesh, Svetha. In Proc. of Intl. Conf. on Uncertainty in Artificial Intelligence (UAI), pages 316-325, 2012. [ | ]

@INPROCEEDINGS { gupta_phung_venkatesh_uai12, TITLE = { A Slice Sampler for Restricted Hierarchical {B}eta Process with Applications to Shared Subspace Learning }, AUTHOR = { Gupta, S. and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of Intl. Conf. on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2012 }, PAGES = { 316--325 }, OWNER = { dinh }, TIMESTAMP = { 2012.05.24 }, }

A Bayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources
Gupta, S., Phung, Dinh and Venkatesh, Svetha. In Proc. of SIAM Intl. Conf. on Data Mining (SDM), pages 200-211, 2012. [ | ]
Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval.

@INPROCEEDINGS { gupta_phung_venkatesh_sdm12, TITLE = { A {B}ayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources }, AUTHOR = { Gupta, S. and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of SIAM Intl. Conf. on Data Mining (SDM) }, YEAR = { 2012 }, PAGES = { 200--211 }, ABSTRACT = { Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval. }, }

A Sequential Decision Approach to Ordinal Preferences in Recommender Systems
Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proc. of AAAI Conf. on Artificial Intelligence (AAAI), pages 676-682, 2012. [ | ]
We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods.

@INPROCEEDINGS { truyen_phung_venkatesh_aaai12, TITLE = { A Sequential Decision Approach to Ordinal Preferences in Recommender Systems }, AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of AAAI Conf. on Artificial Intelligence (AAAI) }, YEAR = { 2012 }, PAGES = { 676--682 }, ABSTRACT = { We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods. }, TIMESTAMP = { 2012.04.11 }, }

Improved Subspace Clustering via Exploitation of Spatial Constraints
Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 550-557, 2012. [ | ]
We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation.

@INPROCEEDINGS { pham_budhaditya_phung_venkatesh_cvpr12, TITLE = { Improved Subspace Clustering via Exploitation of Spatial Constraints }, AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2012 }, PAGES = { 550--557 }, ABSTRACT = { We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation. }, OWNER = { thinng }, TIMESTAMP = { 2012.04.11 }, }

Sparse Subspace Representation for Spectral Document Clustering
Saha, B., Phung, Dinh, Pham, D.S. and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Data Mining (ICDM), pages 1092-1097, 2012. [ | ]

@INPROCEEDINGS { saha_phung_pham_venkatesh_icdm12, TITLE = { Sparse Subspace Representation for Spectral Document Clustering }, AUTHOR = { Saha, B. and Phung, Dinh and Pham, D.S. and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Data Mining (ICDM) }, YEAR = { 2012 }, PAGES = { 1092--1097 }, OWNER = { dinh }, TIMESTAMP = { 2012.10.31 }, }

Detection of Cross-Channel Anomalies
Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 35(1):33-59, 2013. [ | ]
The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.

@ARTICLE { pham_budhaditya_phung_venkatesh_kais13, TITLE = { Detection of Cross-Channel Anomalies }, AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha }, JOURNAL = { Knowledge and Information Systems (KAIS) }, YEAR = { 2013 }, NUMBER = { 1 }, PAGES = { 33--59 }, VOLUME = { 35 }, ABSTRACT = { The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. }, }

Detection of Cross-Channel Anomalies From Multiple Data Channels
Pham, S., Budhaditya, Saha, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Data Mining (ICDM), Vancouver, Canada, December 2011. [ | ]
We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.

@INPROCEEDINGS { pham_budhaditya_phung_venkatesh_icdm11, TITLE = { Detection of Cross-Channel Anomalies From Multiple Data Channels }, AUTHOR = { Pham, S. and Budhaditya, Saha and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Data Mining (ICDM) }, YEAR = { 2011 }, ADDRESS = { Vancouver, Canada }, MONTH = { December }, ABSTRACT = { We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. }, COMMENT = { coauthor }, OWNER = { thinng }, TIMESTAMP = { 2012.04.11 }, }

Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering
Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Procs. of SIAM Intl. Conf. on Data Mining (SDM), Arizona, USA, April 2011. [ | | pdf]
Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that modelsthe process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals.

@INPROCEEDINGS { truyen_phung_venkatesh_sdm11, TITLE = { Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering }, AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Procs. of SIAM Intl. Conf. on Data Mining (SDM) }, YEAR = { 2011 }, ADDRESS = { Arizona, USA }, MONTH = { April }, ABSTRACT = { Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that modelsthe process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\truyen_phung_venkatesh_sdm11.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2011.02.07 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Truyen_etal_sdm11.pdf }, }

Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval
Gupta, Sunil, Phung, Dinh, Adams, Brett, Tran, Truyen and Venkatesh, Svetha. In Proc. of ACM Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Washington DC, USA, July 2010. [ | | pdf]
Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets.

@INPROCEEDINGS { gupta_phung_adams_truyen_venkatesh_sigkdd10, TITLE = { Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval }, AUTHOR = { Gupta, Sunil and Phung, Dinh and Adams, Brett and Tran, Truyen and Venkatesh, Svetha }, BOOKTITLE = { Proc. of ACM Intl. Conf. on Knowledge Discovery and Data Mining (SIGKDD) }, YEAR = { 2010 }, ADDRESS = { Washington DC, USA }, MONTH = { July }, ABSTRACT = { Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\gupta_phung_adams_truyen_venkatesh_sigkdd10.pdf:PDF }, OWNER = { Dinh Phung }, TIMESTAMP = { 2010.06.29 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Gupta_etal_sigkdd10.pdf }, }

Efficient duration and hierarchical modeling for human activity recognition
Duong, Thi, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. Artificial Intelligence (AIJ), 173(7-8):830-856, 2009. [ | | pdf | code]
A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling.

@ARTICLE { duong_phung_bui_venkatesh_aij09, AUTHOR = { Duong, Thi and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, TITLE = { Efficient duration and hierarchical modeling for human activity recognition }, JOURNAL = { Artificial Intelligence (AIJ) }, YEAR = { 2009 }, VOLUME = { 173 }, NUMBER = { 7-8 }, PAGES = { 830--856 }, ABSTRACT = { A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling. }, CODE = { https://github.com/DASCIMAL/CxHSMM }, COMMENT = { coauthor }, DOI = { http://dx.doi.org/10.1016/j.artint.2008.12.005 }, FILE = { :duong_phung_bui_venkatesh_aij09 - Efficient Duration and Hierarchical Modeling for Human Activity Recognition.pdf:PDF }, KEYWORDS = { activity, recognition, duration modeling, Coxian, Hidden semi-Markov model, HSMM , smart surveillance }, OWNER = { 184698H }, PUBLISHER = { Elsevier }, TIMESTAMP = { 2010.08.11 }, URL = { http://www.sciencedirect.com/science/article/pii/S0004370208002142 }, }

MCMC for Hierarchical Semi-Markov Conditional Random Fields
Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of Workshop on Deep Learning for Speech Recognition and Related Applications, in conjunction with the Neural Information Processing Systems Conference (NIPS), Whistler, BC, Canada, December 2009. [ | ]
Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length.

@INPROCEEDINGS { truyen_phung_bui_venkatesh_nips09, TITLE = { {MCMC} for Hierarchical Semi-Markov Conditional Random Fields }, AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, BOOKTITLE = { Proc. of Workshop on Deep Learning for Speech Recognition and Related Applications, in conjunction with the Neural Information Processing Systems Conference (NIPS) }, YEAR = { 2009 }, ADDRESS = { Whistler, BC, Canada }, MONTH = { December }, ABSTRACT = { Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length. }, COMMENT = { coauthor }, OWNER = { Dinh Phung }, TIMESTAMP = { 2010.06.29 }, }

Ordinal Boltzmann Machines for Collaborative Filtering
Truyen Tran, Dinh Phung and Svetha Venkatesh. In Proc. of the 25th Conference on Uncertainty in Artificial Intelligence (UAI), pages 548-556, Arlington, Virginia, United States, June 2009. (Runner-up Best Paper Award). [ | | pdf]
Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and co-occurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and itembased processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods.

@INPROCEEDINGS { truyen_phung_venkatesh_uai09, AUTHOR = { Truyen Tran and Dinh Phung and Svetha Venkatesh }, TITLE = { Ordinal Boltzmann Machines for Collaborative Filtering }, BOOKTITLE = { Proc. of the 25th Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2009 }, SERIES = { UAI '09 }, PAGES = { 548--556 }, ADDRESS = { Arlington, Virginia, United States }, MONTH = { June }, PUBLISHER = { AUAI Press }, NOTE = { Runner-up Best Paper Award }, ABSTRACT = { Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and co-occurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and itembased processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods. }, ACMID = { 1795178 }, COMMENT = { coauthor }, FILE = { :truyen_phung_venkatesh_uai09 - Ordinal Boltzmann Machines for Collaborative Filtering.pdf:PDF }, ISBN = { 978-0-9749039-5-8 }, LOCATION = { Montreal, Quebec, Canada }, NUMPAGES = { 9 }, OWNER = { Dinh Phung }, TIMESTAMP = { 2009.09.22 }, URL = { http://dl.acm.org/citation.cfm?id=1795114.1795178 }, }

The Hidden Permutation Model and Location-Based Activity Recognition
Bui, Hung, Phung, Dinh, Venkatesh, Svetha and Phan, Hai. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 1345-1350, Chicago, USA, July 2008. [ | | pdf]
Permutation modeling is challenging because of the combinatorial nature of the problem. However, such modeling is often required in many real-world applications, including activity recognition where subactivities are often permuted and partially ordered. This paper introduces a novel Hidden Permutation Model (HPM) that can learn the partial ordering constraints in permuted state sequences. The HPMis parameterized as an exponential family distribution and is flexible so that it can encode constraints via different feature functions. A chain-flipping Metropolis-Hastings Markov chain Monte Carlo (MCMC) is employed for inference to overcome the O(n!) complexity. Gradient-based maximum likelihood parameter learning is presented for two cases when the permutation is known and when it is hidden. The HPM is evaluated using both simulated and real data from a location-based activity recognition domain. Experimental results indicate that the HPM performs far better than other baseline models, including the naive Bayes classifier, the HMM classifier, and Kirshners multinomial permutation model. Our presented HPM is generic and can potentially be utilized in any problem where the modeling of permuted states from noisy data is needed.

@INPROCEEDINGS { bui_phung_venkatesh_phan_aaai08, TITLE = { The Hidden Permutation Model and Location-Based Activity Recognition }, AUTHOR = { Bui, Hung and Phung, Dinh and Venkatesh, Svetha and Phan, Hai }, BOOKTITLE = { Proceedings of the National Conference on Artificial Intelligence (AAAI) }, YEAR = { 2008 }, ADDRESS = { Chicago, USA }, MONTH = { July }, PAGES = { 1345--1350 }, VOLUME = { 8 }, ABSTRACT = { Permutation modeling is challenging because of the combinatorial nature of the problem. However, such modeling is often required in many real-world applications, including activity recognition where subactivities are often permuted and partially ordered. This paper introduces a novel Hidden Permutation Model (HPM) that can learn the partial ordering constraints in permuted state sequences. The HPMis parameterized as an exponential family distribution and is flexible so that it can encode constraints via different feature functions. A chain-flipping Metropolis-Hastings Markov chain Monte Carlo (MCMC) is employed for inference to overcome the O(n!) complexity. Gradient-based maximum likelihood parameter learning is presented for two cases when the permutation is known and when it is hidden. The HPM is evaluated using both simulated and real data from a location-based activity recognition domain. Experimental results indicate that the HPM performs far better than other baseline models, including the naive Bayes classifier, the HMM classifier, and Kirshners multinomial permutation model. Our presented HPM is generic and can potentially be utilized in any problem where the modeling of permuted states from noisy data is needed. }, FILE = { :papers\\phung\\bui_phung_venkatesh_phan_aaai08.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { http://www.aaai.org/Papers/AAAI/2008/AAAI08-213.pdf }, }

Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data
Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. Advances in Neural Information Processing (NIPS), December 2008. [ | | ]
Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.

@ARTICLE { truyen_phung_bui_venkatesh_nips08, TITLE = { Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data }, AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, JOURNAL = { Advances in Neural Information Processing (NIPS) }, YEAR = { 2008 }, MONTH = { December }, ABSTRACT = { Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. }, ADDRESS = { Vancouver, Canada }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2008/conferences/truyen_phung_bui_venkatesh_nips08.pdf }, }

AdaBoost.MRF: Boosted Markov Random Forests and Application to Multilevel Activity Recognition
Tran, Truyen, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1686-1693, New York, USA, June 2006. [ | ]
Activity recognition is an important issue in building intelligent monitoring systems. We address the recognition of multilevel activities in this paper via a conditional Markov random field (MRF), known as the dynamic conditional random field (DCRF). Parameter estimation in general MRFs using maximum likelihood is known to be computationally challenging (except for extreme cases), and thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. Distinct from most existing work, our algorithm can handle hidden variables (missing labels) and is particularly attractive for smarthouse domains where reliable labels are often sparsely observed. Furthermore, our method works exclusively on trees and thus is guaranteed to converge. We apply the AdaBoost.MRF algorithmto a home video surveillance application and demonstrate its efficacy.

@INPROCEEDINGS { truyen_phung_bui_venkatesh_cvpr06, TITLE = { {AdaBoost.MRF}: Boosted {M}arkov Random Forests and Application to Multilevel Activity Recognition }, AUTHOR = { Tran, Truyen and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2006 }, ADDRESS = { New York, USA }, MONTH = { June }, PAGES = { 1686-1693 }, ABSTRACT = { Activity recognition is an important issue in building intelligent monitoring systems. We address the recognition of multilevel activities in this paper via a conditional Markov random field (MRF), known as the dynamic conditional random field (DCRF). Parameter estimation in general MRFs using maximum likelihood is known to be computationally challenging (except for extreme cases), and thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. Distinct from most existing work, our algorithm can handle hidden variables (missing labels) and is particularly attractive for smarthouse domains where reliable labels are often sparsely observed. Furthermore, our method works exclusively on trees and thus is guaranteed to converge. We apply the AdaBoost.MRF algorithmto a home video surveillance application and demonstrate its efficacy. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model
Duong, Thi, Bui, Hung, Phung, Dinh and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 838-845, San Diego, 20-26 June 2005. [ | ]
This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model.

@INPROCEEDINGS { duong_bui_phung_venkatesh_cvpr05, TITLE = { Activity Recognition and Abnormality Detection with the {S}witching {H}idden {S}emi-{M}arkov {M}odel }, AUTHOR = { Duong, Thi and Bui, Hung and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2005 }, ADDRESS = { San Diego }, MONTH = { 20-26 June }, PAGES = { 838--845 }, PUBLISHER = { IEEE Computer Society }, VOLUME = { 1 }, ABSTRACT = { This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model. }, KEYWORDS = { Activity Recognition, Abnormality detection, semi-Markov, hierarchical HSMM }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model
Nguyen, N., Phung, Dinh, Bui, Hung and Venkatesh, Svetha. In Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 955-960, San Diego, 2005. [ | ]
Directly modeling the inherent hierarchy and shared structures of human behaviors, we present an application of the hierarchical hidden Markov model (HHMM) for the problem of activity recognition. We argue that to robustly model and recognize complex human activities, it is crucial to exploit both the natural hierarchical decomposition and shared semantics embedded in the movement trajectories. To this end, we propose the use of the HHMM, a rich stochastic model that has been recently extended to handle shared structures, for representing and recognizing a set of complex indoor activities. Furthermore, in the need of real-time recognition, we propose a Rao-Blackwellised particle filter (RBPF) that efficiently computes the filtering distribution at a constant time complexity for each new observation arrival. The main contributions of this paper lie in the application of the sharedstructure HHMM, the estimation of the model's parameters at all levels simultaneously, and a construction of an RBPF approximate inference scheme. The experimental results in a real-world environment have confirmed our belief that directly modeling shared structures not only reduces computational cost, but also improves recognition accuracy when compared with the tree HHMM and the flat HMM.

@INPROCEEDINGS { nguyen_phung_bui_venkatesh_cvpr05, TITLE = { Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model }, AUTHOR = { Nguyen, N. and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, BOOKTITLE = { Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2005 }, ADDRESS = { San Diego }, PAGES = { 955--960 }, PUBLISHER = { IEEE Computer Soceity }, VOLUME = { 1 }, ABSTRACT = { Directly modeling the inherent hierarchy and shared structures of human behaviors, we present an application of the hierarchical hidden Markov model (HHMM) for the problem of activity recognition. We argue that to robustly model and recognize complex human activities, it is crucial to exploit both the natural hierarchical decomposition and shared semantics embedded in the movement trajectories. To this end, we propose the use of the HHMM, a rich stochastic model that has been recently extended to handle shared structures, for representing and recognizing a set of complex indoor activities. Furthermore, in the need of real-time recognition, we propose a Rao-Blackwellised particle filter (RBPF) that efficiently computes the filtering distribution at a constant time complexity for each new observation arrival. The main contributions of this paper lie in the application of the sharedstructure HHMM, the estimation of the model's parameters at all levels simultaneously, and a construction of an RBPF approximate inference scheme. The experimental results in a real-world environment have confirmed our belief that directly modeling shared structures not only reduces computational cost, but also improves recognition accuracy when compared with the tree HHMM and the flat HMM. }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models
Phung, Dinh, Duong, Thi, Bui, Hung and Venkatesh, Svetha. In Proc. of ACM Intl. Conf. on Multimedia (ACM-MM), Singapore, 6--11 Nov. 2005. [ | ]
In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling.

@INPROCEEDINGS { phung_duong_bui_venkatesh_acmmm05, TITLE = { Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models }, AUTHOR = { Phung, Dinh and Duong, Thi and Bui, Hung and Venkatesh, Svetha }, BOOKTITLE = { Proc. of ACM Intl. Conf. on Multimedia (ACM-MM) }, YEAR = { 2005 }, ADDRESS = { Singapore }, MONTH = { 6--11 Nov. }, ABSTRACT = { In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling. }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Hierarchical Hidden Markov Models with General State Hierarchy
Bui, Hung, Phung, Dinh and Venkatesh, Svetha. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 324-329, San Jose, California, USA, 2004. [ | | pdf]
The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition.

@INPROCEEDINGS { bui_phung_venkatesh_aaai04, TITLE = { Hierarchical Hidden Markov Models with General State Hierarchy }, AUTHOR = { Bui, Hung and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proceedings of the National Conference on Artificial Intelligence (AAAI) }, YEAR = { 2004 }, ADDRESS = { San Jose, California, USA }, EDITOR = { McGuinness, Deborah L. and Ferguson, George }, PAGES = { 324--329 }, PUBLISHER = { MIT Press }, ABSTRACT = { The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition. }, FILE = { :papers\\phung\\bui_phung_venkatesh_aaai04.pdf:PDF }, GROUP = { Statistics, Hierarchical Hidden Markov Models (HMM,HHMM) }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { http://www.aaai.org/Papers/AAAI/2004/AAAI04-052.pdf }, }

2022

Improving kernel online learning with a snapshot memory
Trung Le, Khanh Nguyen and Dinh Phung. Machine Learning, jan 2022. [ | | pdf]
We propose in this paper the Stochastic Variance-reduced Gradient Descent for Kernel Online Learning (DualSVRG), which obtains the ε-approximate linear convergence rate and is not vulnerable to the curse of kernelization. Our approach uses a variance reduction technique to reduce the variance when estimating full gradient, and further exploits recent work in dual space gradient descent for online learning to achieve model optimality. This is achieved by introducing the concept of an instant memory, which is a snapshot storing the most recent incoming data instances and proposing three transformer oracles, namely budget, coverage, and always-move oracles. We further develop rigorous theoretical analysis to demonstrate that our proposed approach can obtain the ε-approximate linear convergence rate, while maintaining model sparsity, hence encourages fast training. We conduct extensive experiments on several benchmark datasets to compare our DualSVRG with state-of-the-art baselines in both batch and online settings. The experimental results show that our DualSVRG yields superior predictive performance, while spending comparable training time with baselines.

@ARTICLE { le_etal_ml22_improving, AUTHOR = { Trung Le and Khanh Nguyen and Dinh Phung }, JOURNAL = { Machine Learning }, TITLE = { Improving kernel online learning with a snapshot memory }, YEAR = { 2022 }, MONTH = { jan }, PAGES = { 1--22 }, ABSTRACT = { We propose in this paper the Stochastic Variance-reduced Gradient Descent for Kernel Online Learning (DualSVRG), which obtains the ε-approximate linear convergence rate and is not vulnerable to the curse of kernelization. Our approach uses a variance reduction technique to reduce the variance when estimating full gradient, and further exploits recent work in dual space gradient descent for online learning to achieve model optimality. This is achieved by introducing the concept of an instant memory, which is a snapshot storing the most recent incoming data instances and proposing three transformer oracles, namely budget, coverage, and always-move oracles. We further develop rigorous theoretical analysis to demonstrate that our proposed approach can obtain the ε-approximate linear convergence rate, while maintaining model sparsity, hence encourages fast training. We conduct extensive experiments on several benchmark datasets to compare our DualSVRG with state-of-the-art baselines in both batch and online settings. The experimental results show that our DualSVRG yields superior predictive performance, while spending comparable training time with baselines. }, DOI = { https://doi.org/10.1007/s10994-021-06075-7 }, TIMESTAMP = { 2022-03-19 }, URL = { https://link.springer.com/article/10.1007/s10994-021-06075-7 }, }

Node Co-Occurrence Based Graph Neural Networks for Knowledge Graph Link Prediction
Nguyen, Dai Quoc, Tong, Vinh, Phung, Dinh and Nguyen, Dat Quoc. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, page 1589–1592, New York, NY, USA, 2022. [ | | pdf]
We introduce a novel embedding model, named NoGE, which aims to integrate co-occurrence among entities and relations into graph neural networks to improve knowledge graph completion (i.e., link prediction). Given a knowledge graph, NoGE constructs a single graph considering entities and relations as individual nodes. NoGE then computes weights for edges among nodes based on the co-occurrence of entities and relations. Next, NoGE proposes Dual Quaternion Graph Neural Networks (DualQGNN) and utilizes DualQGNN to update vector representations for entity and relation nodes. NoGE then adopts a score function to produce the triple scores. Comprehensive experimental results show that NoGE obtains state-of-the-art results on three new and difficult benchmark datasets CoDEx for knowledge graph completion.

@INPROCEEDINGS { nguyen_etal_wsdm22_node_cooccurrence, AUTHOR = { Nguyen, Dai Quoc and Tong, Vinh and Phung, Dinh and Nguyen, Dat Quoc }, TITLE = { Node Co-Occurrence Based Graph Neural Networks for Knowledge Graph Link Prediction }, YEAR = { 2022 }, ISBN = { 9781450391320 }, PUBLISHER = { Association for Computing Machinery }, ADDRESS = { New York, NY, USA }, URL = { https://doi.org/10.1145/3488560.3502183 }, DOI = { 10.1145/3488560.3502183 }, ABSTRACT = { We introduce a novel embedding model, named NoGE, which aims to integrate co-occurrence among entities and relations into graph neural networks to improve knowledge graph completion (i.e., link prediction). Given a knowledge graph, NoGE constructs a single graph considering entities and relations as individual nodes. NoGE then computes weights for edges among nodes based on the co-occurrence of entities and relations. Next, NoGE proposes Dual Quaternion Graph Neural Networks (DualQGNN) and utilizes DualQGNN to update vector representations for entity and relation nodes. NoGE then adopts a score function to produce the triple scores. Comprehensive experimental results show that NoGE obtains state-of-the-art results on three new and difficult benchmark datasets CoDEx for knowledge graph completion. }, BOOKTITLE = { Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining }, PAGES = { 1589–1592 }, NUMPAGES = { 4 }, KEYWORDS = { graph neural networks, knowledge graph embeddings, knowledge graph completion, quaternion }, LOCATION = { Virtual Event, AZ, USA }, SERIES = { WSDM '22 }, }

Vietnamese Speech-Based Question Answering over Car Manuals
Vo, Tin Duy, Luong, Manh, Le, Duong Minh, Tran, Hieu, Do, Nhan, Nguyen, Tuan-Duy H., Nguyen, Thien, Bui, Hung, Nguyen, Dat Quoc and Phung, Dinh. In 27th International Conference on Intelligent User Interfaces, page 117–119, New York, NY, USA, 2022. [ | | pdf]
This paper presents a novel Vietnamese speech-based question answering system QA-CarManual that enables users to ask car-manual-related questions (e.g. how to properly operate devices and/or utilities within a car). Given a car manual written in Vietnamese as the main knowledge base, we develop QA-CarManual as a lightweight, real-time and interactive system that integrates state-of-the-art technologies in language and speech processing to (i) understand and interact with users via speech commands and (ii) automatically query the knowledge base and return answers in both forms of text and speech as well as visualization. To our best knowledge, QA-CarManual is the first Vietnamese question answering system that interacts with users via speech inputs and outputs. We perform a human evaluation to assess the quality of our QA-CarManual system and obtain promising results.

@INPROCEEDINGS { vo_etal_iui22_vietnamese_speech, AUTHOR = { Vo, Tin Duy and Luong, Manh and Le, Duong Minh and Tran, Hieu and Do, Nhan and Nguyen, Tuan-Duy H. and Nguyen, Thien and Bui, Hung and Nguyen, Dat Quoc and Phung, Dinh }, TITLE = { Vietnamese Speech-Based Question Answering over Car Manuals }, YEAR = { 2022 }, ISBN = { 9781450391450 }, PUBLISHER = { Association for Computing Machinery }, ADDRESS = { New York, NY, USA }, URL = { https://doi.org/10.1145/3490100.3516525 }, DOI = { 10.1145/3490100.3516525 }, ABSTRACT = { This paper presents a novel Vietnamese speech-based question answering system QA-CarManual that enables users to ask car-manual-related questions (e.g. how to properly operate devices and/or utilities within a car). Given a car manual written in Vietnamese as the main knowledge base, we develop QA-CarManual as a lightweight, real-time and interactive system that integrates state-of-the-art technologies in language and speech processing to (i) understand and interact with users via speech commands and (ii) automatically query the knowledge base and return answers in both forms of text and speech as well as visualization. To our best knowledge, QA-CarManual is the first Vietnamese question answering system that interacts with users via speech inputs and outputs. We perform a human evaluation to assess the quality of our QA-CarManual system and obtain promising results. }, BOOKTITLE = { 27th International Conference on Intelligent User Interfaces }, PAGES = { 117–119 }, NUMPAGES = { 3 }, LOCATION = { Helsinki, Finland }, SERIES = { IUI '22 Companion }, }

A Unified Wasserstein Distributional Robustness Framework for Adversarial Training
Anh Tuan Bui, Trung Le, Quan Hung Tran, He Zhao and Dinh Phung. ((in press)). [ | ]
It is well-known that deep neural networks (DNNs) are susceptible to adversarial attacks, exposing a severe fragility of deep learning systems. As the result, adversarial training (AT) method, by incorporating adversarial examples during training, represents a natural and effective approach to strengthen the robustness of a DNN-based classifier. However, most AT-based methods, notably PGD-AT and TRADES, typically seek a pointwise adversary that generates the worst-case adversarial example by independently perturbing each data sample, as a way to ``probe the vulnerability of the classifier. Arguably, there are unexplored benefits in considering such adversarial effects from an entire distribution. To this end, this paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods. We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework. This connection leads to an intuitive relaxation and generalization of existing AT methods and facilitates the development of a new family of distributional robustness AT-based algorithms. Extensive experiments show that our distributional robustness AT algorithms robustify further their standard AT counterparts in various settings.

@MISC { accepted_anh_etal_iclr22_a_unified_wasserstein, AUTHOR = { Anh Tuan Bui and Trung Le and Quan Hung Tran and He Zhao and Dinh Phung }, TITLE = { A Unified Wasserstein Distributional Robustness Framework for Adversarial Training }, JOURNAL = { ICLR }, ABSTRACT = { It is well-known that deep neural networks (DNNs) are susceptible to adversarial attacks, exposing a severe fragility of deep learning systems. As the result, adversarial training (AT) method, by incorporating adversarial examples during training, represents a natural and effective approach to strengthen the robustness of a DNN-based classifier. However, most AT-based methods, notably PGD-AT and TRADES, typically seek a pointwise adversary that generates the worst-case adversarial example by independently perturbing each data sample, as a way to ``probethe vulnerability of the classifier. Arguably, there are unexplored benefits in considering such adversarial effects from an entire distribution. To this end, this paper presents a unified framework that connects Wasserstein distributional robustness with current state-of-the-art AT methods. We introduce a new Wasserstein cost function and a new series of risk functions, with which we show that standard AT methods are special cases of their counterparts in our framework. This connection leads to an intuitive relaxation and generalization of existing AT methods and facilitates the development of a new family of distributional robustness AT-based algorithms. Extensive experiments show that our distributional robustness AT algorithms robustify further their standard AT counterparts in various settings. }, NOTE = { (in press) }, }

Sobolev Transport: A Scalable Metric for Probability Measures with Graph Metrics
Le, Tam, Nguyen, Truyen, Phung, Dinh and Nguyen, Viet Anh. ((in press)). [ | ]
Optimal transport (OT) is a popular measure to compare probability distributions. However, OT suffers a few drawbacks such as (i) a high complexity for computation, (ii) indefiniteness which limits its applicability to kernel machines. In this work, we consider probability measures supported on a graph metric space and propose a novel Sobolev transport metric. We show that the Sobolev transport metric yields a closed-form formula for fast computation and it is negative definite. We show that the space of probability measures endowed with this transport distance is isometric to a bounded convex set in a Euclidean space with a weighted $\ell_p$ distance. We further exploit the negative definiteness of the Sobolev transport to design positive-definite kernels, and evaluate their performances against other baselines in document classification with word embeddings and in topological data analysis.

@MISC { accepted_le_etal_aistats22_sobolev_transport, AUTHOR = { Le, Tam and Nguyen, Truyen and Phung, Dinh and Nguyen, Viet Anh }, TITLE = { Sobolev Transport: A Scalable Metric for Probability Measures with Graph Metrics }, JOURNAL = { AISTATS }, ABSTRACT = { Optimal transport (OT) is a popular measure to compare probability distributions. However, OT suffers a few drawbacks such as (i) a high complexity for computation, (ii) indefiniteness which limits its applicability to kernel machines. In this work, we consider probability measures supported on a graph metric space and propose a novel Sobolev transport metric. We show that the Sobolev transport metric yields a closed-form formula for fast computation and it is negative definite. We show that the space of probability measures endowed with this transport distance is isometric to a bounded convex set in a Euclidean space with a weighted $\ell_p$ distance. We further exploit the negative definiteness of the Sobolev transport to design positive-definite kernels, and evaluate their performances against other baselines in document classification with word embeddings and in topological data analysis. }, NOTE = { (in press) }, }

Particle-based Adversarial Local Distribution Regularization
Nguyen-Duc, Thanh, Le, Trung, Zhao, He, Cai, Jianfei and Phung, Dinh. ((in press)). [ | ]
To-be-updated

@MISC { accepted_nguyen_etal_aistats22_particle_based, AUTHOR = { Nguyen-Duc, Thanh and Le, Trung and Zhao, He and Cai, Jianfei and Phung, Dinh }, TITLE = { Particle-based Adversarial Local Distribution Regularization }, JOURNAL = { AISTATS }, ABSTRACT = { To-be-updated }, NOTE = { (in press) }, }

2021

Neural Topic Model via Optimal Transport
He Zhao, Dinh Phung, Viet Huynh, Trung Le and Wray Buntine. In In Proc. of the 9th Int. Conf. on Learning Representations (ICLR), 2021. [ | ]

@INPROCEEDINGS { zhao_etal_iclr2021_neural, AUTHOR = { He Zhao and Dinh Phung and Viet Huynh and Trung Le and Wray Buntine }, BOOKTITLE = { In Proc. of the 9th Int. Conf. on Learning Representations (ICLR) }, TITLE = { Neural Topic Model via Optimal Transport }, YEAR = { 2021 }, ARCHIVEPREFIX = { arXiv }, EPRINT = { 2008.13537 }, PRIMARYCLASS = { cs.IR }, TIMESTAMP = { 2021-01-13 }, }

Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness
Anh Bui, Trung Le, He Zhao, Paul Montague, Olivier deVel, Tamas Abraham and Dinh Phung. In In Proc. of Int. Conf. on Artificial Intelligence (AAAI), 2021. [ | ]
Ensemble-based adversarial training is a principled approach to achieve robustness against adversarial attacks. An important technique of this approach is to control the transferability of adversarial examples among ensemble members. We propose in this work a simple yet effective strategy to collaborate among committee models of an ensemble model. This is achieved via the secure and insecure sets defined for each model member on a given sample, hence help us to quantify and regularize the transferability. Consequently, our proposed framework provides the flexibility to reduce the adversarial transferability as well as to promote the diversity of ensemble members, which are two crucial factors for better robustness in our ensemble approach. We conduct extensive and comprehensive experiments to demonstrate that our proposed method outperforms the state-of-the-art ensemble baselines, at the same time can detect a wide range of adversarial examples with a nearly perfect accuracy.

@INPROCEEDINGS { bui_etal_20aaai_improving, AUTHOR = { Anh Bui and Trung Le and He Zhao and Paul Montague and Olivier deVel and Tamas Abraham and Dinh Phung }, BOOKTITLE = { In Proc. of Int. Conf. on Artificial Intelligence (AAAI) }, TITLE = { Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness }, YEAR = { 2021 }, ABSTRACT = { Ensemble-based adversarial training is a principled approach to achieve robustness against adversarial attacks. An important technique of this approach is to control the transferability of adversarial examples among ensemble members. We propose in this work a simple yet effective strategy to collaborate among committee models of an ensemble model. This is achieved via the secure and insecure sets defined for each model member on a given sample, hence help us to quantify and regularize the transferability. Consequently, our proposed framework provides the flexibility to reduce the adversarial transferability as well as to promote the diversity of ensemble members, which are two crucial factors for better robustness in our ensemble approach. We conduct extensive and comprehensive experiments to demonstrate that our proposed method outperforms the state-of-the-art ensemble baselines, at the same time can detect a wide range of adversarial examples with a nearly perfect accuracy. }, DATE = { 2020-09-21 }, EPRINT = { 2009.09612 }, EPRINTCLASS = { cs.CV }, EPRINTTYPE = { arXiv }, FILE = { :bui_etal_20aaai_improving - Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness.pdf:PDF }, JOURNAL = { In Proc. of Int. Conf. on Artificial Intelligence (AAAI) }, KEYWORDS = { cs.CV, cs.LG }, }

Exploiting Domain-Specific Features to Enhance Domain Generalization
Ha Bui, Toan Tran, Anh Tuan Tran and Dinh Phung. In Advances in Neural Information Processing Systems, 2021. [ | | pdf]

@INPROCEEDINGS { ha_etal_neurips21_exploiting, TITLE = { Exploiting Domain-Specific Features to Enhance Domain Generalization }, AUTHOR = { Ha Bui and Toan Tran and Anh Tuan Tran and Dinh Phung }, BOOKTITLE = { Advances in Neural Information Processing Systems }, EDITOR = { A. Beygelzimer and Y. Dauphin and P. Liang and J. Wortman Vaughan }, YEAR = { 2021 }, URL = { https://openreview.net/forum?id=vKxFYApxBjr }, }

On Learning Domain-Invariant Representations for Transfer Learning with Multiple Sources
Trung Quoc Phung, Trung Le, Long Tung Vuong, Toan Tran, Anh Tuan Tran, Hung Bui and Dinh Phung. In Advances in Neural Information Processing Systems, 2021. [ | | pdf]

@INPROCEEDINGS { trung_etal_neurips21_on_learning_domain_invariant, TITLE = { On Learning Domain-Invariant Representations for Transfer Learning with Multiple Sources }, AUTHOR = { Trung Quoc Phung and Trung Le and Long Tung Vuong and Toan Tran and Anh Tuan Tran and Hung Bui and Dinh Phung }, BOOKTITLE = { Advances in Neural Information Processing Systems }, EDITOR = { A. Beygelzimer and Y. Dauphin and P. Liang and J. Wortman Vaughan }, YEAR = { 2021 }, URL = { https://openreview.net/forum?id=LkNBNOut0oD }, }

Most: multi-source domain adaptation via optimal transport for student-teacher learning
Nguyen, Tuan, Le, Trung, Zhao, He, Tran, Quan Hung, Nguyen, Truyen and Phung, Dinh. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 225-235, 27--30 Jul 2021. [ | | pdf]
Multi-source domain adaptation (DA) is more challenging than conventional DA because the knowledge is transferred from several source domains to a target domain. To this end, we propose in this paper a novel model for multi-source DA using the theory of optimal transport and imitation learning. More specifically, our approach consists of two cooperative agents: a teacher classifier and a student classifier. The teacher classifier is a combined expert that leverages knowledge of domain experts that can be theoretically guaranteed to handle perfectly source examples, while the student classifier acting on the target domain tries to imitate the teacher classifier acting on the source domains. Our rigorous theory developed based on optimal transport makes this cross-domain imitation possible and also helps to mitigate not only the data shift but also the label shift, which are inherently thorny issues in DA research. We conduct comprehensive experiments on real-world datasets to demonstrate the merit of our approach and its optimal transport based imitation learning viewpoint. Experimental results show that our proposed method achieves state-of-the-art performance on benchmark datasets for multi-source domain adaptation including Digits-five, Office-Caltech10, and Office-31 to the best of our knowledge.

@INPROCEEDINGS { nguyen_etal_uai21_most, TITLE = { Most: multi-source domain adaptation via optimal transport for student-teacher learning }, AUTHOR = { Nguyen, Tuan and Le, Trung and Zhao, He and Tran, Quan Hung and Nguyen, Truyen and Phung, Dinh }, BOOKTITLE = { Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence }, PAGES = { 225--235 }, YEAR = { 2021 }, EDITOR = { de Campos, Cassio and Maathuis, Marloes H. }, VOLUME = { 161 }, SERIES = { Proceedings of Machine Learning Research }, MONTH = { 27--30 Jul }, PUBLISHER = { PMLR }, PDF = { https://proceedings.mlr.press/v161/nguyen21a/nguyen21a.pdf }, URL = { https://proceedings.mlr.press/v161/nguyen21a.html }, ABSTRACT = { Multi-source domain adaptation (DA) is more challenging than conventional DA because the knowledge is transferred from several source domains to a target domain. To this end, we propose in this paper a novel model for multi-source DA using the theory of optimal transport and imitation learning. More specifically, our approach consists of two cooperative agents: a teacher classifier and a student classifier. The teacher classifier is a combined expert that leverages knowledge of domain experts that can be theoretically guaranteed to handle perfectly source examples, while the student classifier acting on the target domain tries to imitate the teacher classifier acting on the source domains. Our rigorous theory developed based on optimal transport makes this cross-domain imitation possible and also helps to mitigate not only the data shift but also the label shift, which are inherently thorny issues in DA research. We conduct comprehensive experiments on real-world datasets to demonstrate the merit of our approach and its optimal transport based imitation learning viewpoint. Experimental results show that our proposed method achieves state-of-the-art performance on benchmark datasets for multi-source domain adaptation including Digits-five, Office-Caltech10, and Office-31 to the best of our knowledge. }, }

Quaternion Graph Neural Networks
Nguyen, Dai Quoc, Nguyen, Tu Dinh and Phung, Dinh. In Proceedings of The 13th Asian Conference on Machine Learning, pages 236-251, 17--19 Nov 2021. [ | | pdf]
Recently, graph neural networks (GNNs) have become an important and active research direction in deep learning. It is worth noting that most of the existing GNN-based methods learn graph representations within the Euclidean vector space. Beyond the Euclidean space, learning representation and embeddings in hyper-complex space have also shown to be a promising and effective approach. To this end, we propose Quaternion Graph Neural Networks (QGNN) to learn graph representations within the Quaternion space. As demonstrated, the Quaternion space, a hyper-complex vector space, provides highly meaningful computations and analogical calculus through Hamilton product compared to the Euclidean and complex vector spaces. Our QGNN obtains state-of-the-art results on a range of benchmark datasets for graph classification and node classification. Besides, regarding knowledge graphs, our QGNN-based embedding model achieves state-of-the-art results on three new and challenging benchmark datasets for knowledge graph completion. Our code is available at: \url{https://github.com/daiquocnguyen/QGNN}.

@INPROCEEDINGS { nguyen_etal_acml21_quaternion, TITLE = { Quaternion Graph Neural Networks }, AUTHOR = { Nguyen, Dai Quoc and Nguyen, Tu Dinh and Phung, Dinh }, BOOKTITLE = { Proceedings of The 13th Asian Conference on Machine Learning }, PAGES = { 236--251 }, YEAR = { 2021 }, EDITOR = { Balasubramanian, Vineeth N. and Tsang, Ivor }, VOLUME = { 157 }, SERIES = { Proceedings of Machine Learning Research }, MONTH = { 17--19 Nov }, PUBLISHER = { PMLR }, PDF = { https://proceedings.mlr.press/v157/nguyen21a/nguyen21a.pdf }, URL = { https://proceedings.mlr.press/v157/nguyen21a.html }, ABSTRACT = { Recently, graph neural networks (GNNs) have become an important and active research direction in deep learning. It is worth noting that most of the existing GNN-based methods learn graph representations within the Euclidean vector space. Beyond the Euclidean space, learning representation and embeddings in hyper-complex space have also shown to be a promising and effective approach. To this end, we propose Quaternion Graph Neural Networks (QGNN) to learn graph representations within the Quaternion space. As demonstrated, the Quaternion space, a hyper-complex vector space, provides highly meaningful computations and analogical calculus through Hamilton product compared to the Euclidean and complex vector spaces. Our QGNN obtains state-of-the-art results on a range of benchmark datasets for graph classification and node classification. Besides, regarding knowledge graphs, our QGNN-based embedding model achieves state-of-the-art results on three new and challenging benchmark datasets for knowledge graph completion. Our code is available at: \url{https://github.com/daiquocnguyen/QGNN}. }, }

Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection
"Vu. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3335-3346, "Online and Punta Cana, nov 2021. [ | | pdf]
"This paper considers the unsupervised domain adaptation problem for neural machine translation (NMT)

@INPROCEEDINGS { vu_etal_emnlp21_generalised, TITLE = { Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection }, AUTHOR = { "Vu }, BOOKTITLE = { Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing }, MONTH = { nov }, YEAR = { 2021 }, ADDRESS = { "Online and Punta Cana }, PUBLISHER = { Association for Computational Linguistics }, URL = { https://aclanthology.org/2021.emnlp-main.268 }, DOI = { 10.18653/v1/2021.emnlp-main.268 }, PAGES = { 3335--3346 }, ABSTRACT = { "This paper considers the unsupervised domain adaptation problem for neural machine translation (NMT) }, }

Information-theoretic Source Code Vulnerability Highlighting
Nguyen, Van, Le, Trung, De Vel, Olivier, Montague, Paul, Grundy, John and Phung, Dinh. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1-8, 2021. [ | ]

@INPROCEEDINGS { nguyen_etal_ijcnn21_information_theoretic, AUTHOR = { Nguyen, Van and Le, Trung and De Vel, Olivier and Montague, Paul and Grundy, John and Phung, Dinh }, BOOKTITLE = { 2021 International Joint Conference on Neural Networks (IJCNN) }, TITLE = { Information-theoretic Source Code Vulnerability Highlighting }, YEAR = { 2021 }, VOLUME = { }, NUMBER = { }, PAGES = { 1-8 }, DOI = { 10.1109/IJCNN52387.2021.9533907 }, }

LAMDA: Label Matching Deep Domain Adaptation
Le, Trung, Nguyen, Tuan, Ho, Nhat, Bui, Hung and Phung, Dinh. In Proceedings of the 38th International Conference on Machine Learning, pages 6043-6054, 18--24 Jul 2021. [ | | pdf]
Deep domain adaptation (DDA) approaches have recently been shown to perform better than their shallow rivals with better modeling capacity on complex domains (e.g., image, structural data, and sequential data). The underlying idea is to learn domain invariant representations on a latent space that can bridge the gap between source and target domains. Several theoretical studies have established insightful understanding and the benefit of learning domain invariant features; however, they are usually limited to the case where there is no label shift, hence hindering its applicability. In this paper, we propose and study a new challenging setting that allows us to use a Wasserstein distance (WS) to not only quantify the data shift but also to define the label shift directly. We further develop a theory to demonstrate that minimizing the WS of the data shift leads to closing the gap between the source and target data distributions on the latent space (e.g., an intermediate layer of a deep net), while still being able to quantify the label shift with respect to this latent space. Interestingly, our theory can consequently explain certain drawbacks of learning domain invariant features on the latent space. Finally, grounded on the results and guidance of our developed theory, we propose the Label Matching Deep Domain Adaptation (LAMDA) approach that outperforms baselines on real-world datasets for DA problems.

@INPROCEEDINGS { le_etal_icml21_lamda, TITLE = { LAMDA: Label Matching Deep Domain Adaptation }, AUTHOR = { Le, Trung and Nguyen, Tuan and Ho, Nhat and Bui, Hung and Phung, Dinh }, BOOKTITLE = { Proceedings of the 38th International Conference on Machine Learning }, PAGES = { 6043--6054 }, YEAR = { 2021 }, EDITOR = { Meila, Marina and Zhang, Tong }, VOLUME = { 139 }, SERIES = { Proceedings of Machine Learning Research }, MONTH = { 18--24 Jul }, PUBLISHER = { PMLR }, PDF = { http://proceedings.mlr.press/v139/le21a/le21a.pdf }, URL = { https://proceedings.mlr.press/v139/le21a.html }, ABSTRACT = { Deep domain adaptation (DDA) approaches have recently been shown to perform better than their shallow rivals with better modeling capacity on complex domains (e.g., image, structural data, and sequential data). The underlying idea is to learn domain invariant representations on a latent space that can bridge the gap between source and target domains. Several theoretical studies have established insightful understanding and the benefit of learning domain invariant features; however, they are usually limited to the case where there is no label shift, hence hindering its applicability. In this paper, we propose and study a new challenging setting that allows us to use a Wasserstein distance (WS) to not only quantify the data shift but also to define the label shift directly. We further develop a theory to demonstrate that minimizing the WS of the data shift leads to closing the gap between the source and target data distributions on the latent space (e.g., an intermediate layer of a deep net), while still being able to quantify the label shift with respect to this latent space. Interestingly, our theory can consequently explain certain drawbacks of learning domain invariant features on the latent space. Finally, grounded on the results and guidance of our developed theory, we propose the Label Matching Deep Domain Adaptation (LAMDA) approach that outperforms baselines on real-world datasets for DA problems. }, }

Topic Modelling Meets Deep Neural Networks: A Survey
Zhao, He, Phung, Dinh, Huynh, Viet, Jin, Yuan, Du, Lan and Buntine, Wray. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21}, pages 4713-4720, 8 2021. (Survey Track} doi = {10.24963/ijcai.2021/638). [ | | pdf]

@INPROCEEDINGS { zhao_etal_ijcai21_topic_modelling, TITLE = { Topic Modelling Meets Deep Neural Networks: A Survey }, AUTHOR = { Zhao, He and Phung, Dinh and Huynh, Viet and Jin, Yuan and Du, Lan and Buntine, Wray }, BOOKTITLE = { Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21} }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, EDITOR = { Zhi-Hua Zhou }, PAGES = { 4713--4720 }, YEAR = { 2021 }, MONTH = { 8 }, NOTE = { Survey Track} doi = {10.24963/ijcai.2021/638 }, URL = { https://doi.org/10.24963/ijcai.2021/638 }, }

Explain2Attack: Text Adversarial Attacks via Cross-Domain Interpretability
Hossam, Mahmoud, Le, Trung, Zhao, He and Phung, Dinh. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 8922-8928, 2021. [ | ]

@INPROCEEDINGS { hossam_etal_icpr21_explain2attack, AUTHOR = { Hossam, Mahmoud and Le, Trung and Zhao, He and Phung, Dinh }, BOOKTITLE = { 2020 25th International Conference on Pattern Recognition (ICPR) }, TITLE = { Explain2Attack: Text Adversarial Attacks via Cross-Domain Interpretability }, YEAR = { 2021 }, VOLUME = { }, NUMBER = { }, PAGES = { 8922-8928 }, DOI = { 10.1109/ICPR48806.2021.9412526 }, }

STEM: An Approach to Multi-Source Domain Adaptation With Guarantees
Nguyen, Van-Anh, Nguyen, Tuan, Le, Trung, Tran, Quan Hung and Phung, Dinh. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9352-9363, October 2021. [ | ]

@INPROCEEDINGS { nguyen_etal_iccv21_stem, AUTHOR = { Nguyen, Van-Anh and Nguyen, Tuan and Le, Trung and Tran, Quan Hung and Phung, Dinh }, TITLE = { STEM: An Approach to Multi-Source Domain Adaptation With Guarantees }, BOOKTITLE = { Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) }, MONTH = { October }, YEAR = { 2021 }, PAGES = { 9352-9363 }, }

Application of Machine Learning Techniques to Identify Data Reliability and Factors Affecting Outcome After Stroke Using Electronic Administrative Records
Rana, Santu, Luo, Wei, Tran, Truyen, Venkatesh, Svetha, Talman, Paul, Phan, Thanh, Phung, Dinh and Clissold, Benjamin. Frontiers in Neurology, 12, 2021. [ | | pdf]
Aim: To use available electronic administrative records to identify data reliability, predict discharge destination, and identify risk factors associated with specific outcomes following hospital admission with stroke, compared to stroke specific clinical factors, using machine learning techniques.Method: The study included 2,531 patients having at least one admission with a confirmed diagnosis of stroke, collected from a regional hospital in Australia within 2009–2013. Using machine learning (penalized regression with Lasso) techniques, patients having their index admission between June 2009 and July 2012 were used to derive predictive models, and patients having their index admission between July 2012 and June 2013 were used for validation. Three different stroke types [intracerebral hemorrhage (ICH), ischemic stroke, transient ischemic attack (TIA)] were considered and five different comparison outcome settings were considered. Our electronic administrative record based predictive model was compared with a predictive model composed of “baseline” clinical features, more specific for stroke, such as age, gender, smoking habits, co-morbidities (high cholesterol, hypertension, atrial fibrillation, and ischemic heart disease), types of imaging done (CT scan, MRI, etc.), and occurrence of in-hospital pneumonia. Risk factors associated with likelihood of negative outcomes were identified.Results: The data was highly reliable at predicting discharge to rehabilitation and all other outcomes vs. death for ICH (AUC 0.85 and 0.825, respectively), all discharge outcomes except home vs. rehabilitation for ischemic stroke, and discharge home vs. others and home vs. rehabilitation for TIA (AUC 0.948 and 0.873, respectively). Electronic health record data appeared to provide improved prediction of outcomes over stroke specific clinical factors from the machine learning models. Common risk factors associated with a negative impact on expected outcomes appeared clinically intuitive, and included older age groups, prior ventilatory support, urinary incontinence, need for imaging, and need for allied health input.Conclusion: Electronic administrative records from this cohort produced reliable outcome prediction and identified clinically appropriate factors negatively impacting most outcome variables following hospital admission with stroke. This presents a means of future identification of modifiable factors associated with patient discharge destination. This may potentially aid in patient selection for certain interventions and aid in better patient and clinician education regarding expected discharge outcomes.

@ARTICLE { rana_etal_frontiersin21_application, AUTHOR = { Rana, Santu and Luo, Wei and Tran, Truyen and Venkatesh, Svetha and Talman, Paul and Phan, Thanh and Phung, Dinh and Clissold, Benjamin }, TITLE = { Application of Machine Learning Techniques to Identify Data Reliability and Factors Affecting Outcome After Stroke Using Electronic Administrative Records }, JOURNAL = { Frontiers in Neurology }, VOLUME = { 12 }, YEAR = { 2021 }, URL = { https://www.frontiersin.org/article/10.3389/fneur.2021.670379 }, DOI = { 10.3389/fneur.2021.670379 }, ISSN = { 1664-2295 }, ABSTRACT = { Aim: To use available electronic administrative records to identify data reliability, predict discharge destination, and identify risk factors associated with specific outcomes following hospital admission with stroke, compared to stroke specific clinical factors, using machine learning techniques.Method: The study included 2,531 patients having at least one admission with a confirmed diagnosis of stroke, collected from a regional hospital in Australia within 2009–2013. Using machine learning (penalized regression with Lasso) techniques, patients having their index admission between June 2009 and July 2012 were used to derive predictive models, and patients having their index admission between July 2012 and June 2013 were used for validation. Three different stroke types [intracerebral hemorrhage (ICH), ischemic stroke, transient ischemic attack (TIA)] were considered and five different comparison outcome settings were considered. Our electronic administrative record based predictive model was compared with a predictive model composed of “baseline” clinical features, more specific for stroke, such as age, gender, smoking habits, co-morbidities (high cholesterol, hypertension, atrial fibrillation, and ischemic heart disease), types of imaging done (CT scan, MRI, etc.), and occurrence of in-hospital pneumonia. Risk factors associated with likelihood of negative outcomes were identified.Results: The data was highly reliable at predicting discharge to rehabilitation and all other outcomes vs. death for ICH (AUC 0.85 and 0.825, respectively), all discharge outcomes except home vs. rehabilitation for ischemic stroke, and discharge home vs. others and home vs. rehabilitation for TIA (AUC 0.948 and 0.873, respectively). Electronic health record data appeared to provide improved prediction of outcomes over stroke specific clinical factors from the machine learning models. Common risk factors associated with a negative impact on expected outcomes appeared clinically intuitive, and included older age groups, prior ventilatory support, urinary incontinence, need for imaging, and need for allied health input.Conclusion: Electronic administrative records from this cohort produced reliable outcome prediction and identified clinically appropriate factors negatively impacting most outcome variables following hospital admission with stroke. This presents a means of future identification of modifiable factors associated with patient discharge destination. This may potentially aid in patient selection for certain interventions and aid in better patient and clinician education regarding expected discharge outcomes. }, }

Optimal Transport for Deep Generative Models: State of the Art and Research Challenges
Huynh, Viet, Phung, Dinh and Zhao, He. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21}, pages 4450-4457, 8 2021. (Survey Track} doi = {10.24963/ijcai.2021/607). [ | | pdf]

@INPROCEEDINGS { huynh_etal_ijcai21_optimal_transport, TITLE = { Optimal Transport for Deep Generative Models: State of the Art and Research Challenges }, AUTHOR = { Huynh, Viet and Phung, Dinh and Zhao, He }, BOOKTITLE = { Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21} }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, EDITOR = { Zhi-Hua Zhou }, PAGES = { 4450--4457 }, YEAR = { 2021 }, MONTH = { 8 }, NOTE = { Survey Track} doi = {10.24963/ijcai.2021/607 }, URL = { https://doi.org/10.24963/ijcai.2021/607 }, }

TIDOT: A Teacher Imitation Learning Approach for Domain Adaptation with Optimal Transport
Nguyen, Tuan, Le, Trung, Dam, Nhan, Tran, Quan Hung, Nguyen, Truyen and Phung, Dinh. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21}, pages 2862-2868, 8 2021. (Main Track} doi = {10.24963/ijcai.2021/394). [ | | pdf]

@INPROCEEDINGS { nguyen_etal_ijcai21_tidot, TITLE = { TIDOT: A Teacher Imitation Learning Approach for Domain Adaptation with Optimal Transport }, AUTHOR = { Nguyen, Tuan and Le, Trung and Dam, Nhan and Tran, Quan Hung and Nguyen, Truyen and Phung, Dinh }, BOOKTITLE = { Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21} }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, EDITOR = { Zhi-Hua Zhou }, PAGES = { 2862--2868 }, YEAR = { 2021 }, MONTH = { 8 }, NOTE = { Main Track} doi = {10.24963/ijcai.2021/394 }, URL = { https://doi.org/10.24963/ijcai.2021/394 }, }

On efficient multilevel Clustering via Wasserstein distances
Viet Huynh, Nhat Ho, Nhan Dam, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui and Dinh Phung. Journal of Machine Learning Research, 22(145):1-43, 2021. [ | | pdf]

@ARTICLE { viet_etal_jmlr21_on_efficient_multilevel, AUTHOR = { Viet Huynh and Nhat Ho and Nhan Dam and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and Dinh Phung }, TITLE = { On efficient multilevel Clustering via Wasserstein distances }, JOURNAL = { Journal of Machine Learning Research }, YEAR = { 2021 }, VOLUME = { 22 }, NUMBER = { 145 }, PAGES = { 1-43 }, URL = { http://jmlr.org/papers/v22/19-782.html }, }

The Monash Autism-ADHD genetics and neurodevelopment (MAGNET) project design and methodologies: a dimensional approach to understanding neurobiological and genetic aetiology
Knott Rachael, Johnson Beth P., Tiego Jeggan, Mellahn Olivia, Finlay Amy, Kallady Kathryn, Kouspos Maria, Mohanakumar Sindhu Vishnu Priya, Hawi Ziarih, Arnatkeviciute Aurina, Chau Tracey, Maron Dalia, Mercieca Emily-Clare, Furley Kirsten, Harris Katrina, Williams Katrina, Ure Alexandra, Fornito Alex, Gray Kylie, Coghill David, Nicholson Ann, Phung Dinh, Loth Eva, Mason Luke, Murphy Declan, Buitelaar Jan and Bellgrove Mark A.. Molecular Autism, 12(1):55, Aug 2021. [ | | pdf]
ASD and ADHD are prevalent neurodevelopmental disorders that frequently co-occur and have strong evidence for a degree of shared genetic aetiology. Behavioural and neurocognitive heterogeneity in ASD and ADHD has hampered attempts to map the underlying genetics and neurobiology, predict intervention response, and improve diagnostic accuracy. Moving away from categorical conceptualisations of psychopathology to a dimensional approach is anticipated to facilitate discovery of data-driven clusters and enhance our understanding of the neurobiological and genetic aetiology of these conditions. The Monash Autism-ADHD genetics and neurodevelopment (MAGNET) project is one of the first large-scale, family-based studies to take a truly transdiagnostic approach to ASD and ADHD. Using a comprehensive phenotyping protocol capturing dimensional traits central to ASD and ADHD, the MAGNET project aims to identify data-driven clusters across ADHD-ASD spectra using deep phenotyping of symptoms and behaviours; investigate the degree of familiality for different dimensional ASD-ADHD phenotypes and clusters; and map the neurocognitive, brain imaging, and genetic correlates of these data-driven symptom-based clusters.

@ARTICLE { knott_etal_MolecularAutism21_the_monash_autism, AUTHOR = { Knott Rachael and Johnson Beth P. and Tiego Jeggan and Mellahn Olivia and Finlay Amy and Kallady Kathryn and Kouspos Maria and Mohanakumar Sindhu Vishnu Priya and Hawi Ziarih and Arnatkeviciute Aurina and Chau Tracey and Maron Dalia and Mercieca Emily-Clare and Furley Kirsten and Harris Katrina and Williams Katrina and Ure Alexandra and Fornito Alex and Gray Kylie and Coghill David and Nicholson Ann and Phung Dinh and Loth Eva and Mason Luke and Murphy Declan and Buitelaar Jan and Bellgrove Mark A. }, TITLE = { The Monash Autism-ADHD genetics and neurodevelopment (MAGNET) project design and methodologies: a dimensional approach to understanding neurobiological and genetic aetiology }, JOURNAL = { Molecular Autism }, YEAR = { 2021 }, MONTH = { Aug }, DAY = { 05 }, VOLUME = { 12 }, NUMBER = { 1 }, PAGES = { 55 }, ABSTRACT = { ASD and ADHD are prevalent neurodevelopmental disorders that frequently co-occur and have strong evidence for a degree of shared genetic aetiology. Behavioural and neurocognitive heterogeneity in ASD and ADHD has hampered attempts to map the underlying genetics and neurobiology, predict intervention response, and improve diagnostic accuracy. Moving away from categorical conceptualisations of psychopathology to a dimensional approach is anticipated to facilitate discovery of data-driven clusters and enhance our understanding of the neurobiological and genetic aetiology of these conditions. The Monash Autism-ADHD genetics and neurodevelopment (MAGNET) project is one of the first large-scale, family-based studies to take a truly transdiagnostic approach to ASD and ADHD. Using a comprehensive phenotyping protocol capturing dimensional traits central to ASD and ADHD, the MAGNET project aims to identify data-driven clusters across ADHD-ASD spectra using deep phenotyping of symptoms and behaviours; investigate the degree of familiality for different dimensional ASD-ADHD phenotypes and clusters; and map the neurocognitive, brain imaging, and genetic correlates of these data-driven symptom-based clusters. }, ISSN = { 2040-2392 }, DOI = { 10.1186/s13229-021-00457-3 }, URL = { https://doi.org/10.1186/s13229-021-00457-3 }, }

2020

Robust Variational Learning for Multiclass Kernel Models with Stein Refinement
Khanh Nguyen, Trung Le, Geoff Webb and Dinh Phung. Transactions on Knowledge and Data Engineering (TKDE), 2020. [ | ]

@ARTICLE { nguyen_etal_tkde20_robusvariational, AUTHOR = { Khanh Nguyen and Trung Le and Geoff Webb and Dinh Phung }, JOURNAL = { Transactions on Knowledge and Data Engineering (TKDE) }, TITLE = { Robust Variational Learning for Multiclass Kernel Models with Stein Refinement }, YEAR = { 2020 }, }

Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering
Quan Hung Tran, Nhan Dam, Tuan Lai, Franck Dernoncourt, Trung Le, Nham Le and Dinh Phung. In In Proc. of the 28th Int. Conf. on Computational Linguistics (COLING), 2020. [ | | pdf]
Interpretability and explainability of deep neural networks are challenging due to their scale, complexity, and the agreeable notions on which the explaining process rests. Previous work, in particular, has focused on representing internal components of neural networks through human-friendly visuals and concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations and/or associations in the past. Hence arguably, a promising approach to make the model transparent is to design it in a way such that the model explicitly connects the current sample with the seen ones, and bases its decision on these samples. Grounded on that principle, we propose in this paper an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. Our model achieves state-of-the-art performance on two popular question answering datasets (i.e. TrecQA and WikiQA). Via further analysis, we show that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused these errors. We believe that this error-tracing capability provides significant benefit in improving dataset quality in many applications.

@INPROCEEDINGS { tran_etal_coling20_explainbyevidence, AUTHOR = { Quan Hung Tran and Nhan Dam and Tuan Lai and Franck Dernoncourt and Trung Le and Nham Le and Dinh Phung }, BOOKTITLE = { In Proc. of the 28th Int. Conf. on Computational Linguistics (COLING) }, TITLE = { Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering }, YEAR = { 2020 }, ABSTRACT = { Interpretability and explainability of deep neural networks are challenging due to their scale, complexity, and the agreeable notions on which the explaining process rests. Previous work, in particular, has focused on representing internal components of neural networks through human-friendly visuals and concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations and/or associations in the past. Hence arguably, a promising approach to make the model transparent is to design it in a way such that the model explicitly connects the current sample with the seen ones, and bases its decision on these samples. Grounded on that principle, we propose in this paper an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. Our model achieves state-of-the-art performance on two popular question answering datasets (i.e. TrecQA and WikiQA). Via further analysis, we show that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused these errors. We believe that this error-tracing capability provides significant benefit in improving dataset quality in many applications. }, FILE = { :tran_etal_coling20_explainbyevidence - Explain by Evidence_ an Explainable Memory Based Neural Network for Question Answering.pdf:PDF }, URL = { https://arxiv.org/abs/2011.03096 }, }

Transfer2Attack: Text Adversarial Attack with Cross-Domain Interpretability
Mahmoud Hossam, Trung Le, He Zhao and Dinh Phung. In In Proc. of the 25th Int. Conf. on Pattern Recognition (ICPR), 2020. [ | ]
Training robust deep learning models is a critical challenge for downstream tasks. Research has shown that common down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans. Understanding the behavior of natural language models under these attacks is crucial to better defend these models against such attacks. In the black-box attack setting, where no access to model parameters is available, the attacker can only query the output information from the targeted model to craft a successful attack. Current black-box state-of-the-art models are costly in both computational complexity and number of queries needed to craft successful adversarial examples. For real world scenarios, the number of queries is critical, where less queries are desired to avoid suspicion towards an attacking agent. In this paper, we propose Transfer2Attack, a black-box adversarial attack on text classification task, that employs cross-domain interpretability to reduce target model queries during attack. We show that our framework either achieves or out-performs attack rates of the state-of-the-art models, yet with lower queries cost and higher efficiency.

@INPROCEEDINGS { hossam_etal_icpr20_transfer2attack, AUTHOR = { Mahmoud Hossam and Trung Le and He Zhao and Dinh Phung }, BOOKTITLE = { In Proc. of the 25th Int. Conf. on Pattern Recognition (ICPR) }, TITLE = { {Transfer2Attack}: Text Adversarial Attack with Cross-Domain Interpretability }, YEAR = { 2020 }, ABSTRACT = { Training robust deep learning models is a critical challenge for downstream tasks. Research has shown that common down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans. Understanding the behavior of natural language models under these attacks is crucial to better defend these models against such attacks. In the black-box attack setting, where no access to model parameters is available, the attacker can only query the output information from the targeted model to craft a successful attack. Current black-box state-of-the-art models are costly in both computational complexity and number of queries needed to craft successful adversarial examples. For real world scenarios, the number of queries is critical, where less queries are desired to avoid suspicion towards an attacking agent. In this paper, we propose Transfer2Attack, a black-box adversarial attack on text classification task, that employs cross-domain interpretability to reduce target model queries during attack. We show that our framework either achieves or out-performs attack rates of the state-of-the-art models, yet with lower queries cost and higher efficiency. }, FILE = { :hossam_etal_icpr20_transfer2attack - Transfer2Attack_ Text Adversarial Attack with Cross Domain Interpretability.pdf:PDF }, }

A Capsule Network-based Model for Learning Node Embeddings
Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. Proc. of the 29th ACM Int. Conf. on Information and Knowledge Management (CIKM), abs/1911.04822, 2020. (Our code is available at: \url{https://github.com/daiquocnguyen/Caps2NE}). [ | | pdf]
In this paper, we focus on learning low-dimensional embeddings for nodes in graph-structured data. To achieve this, we propose Caps2NE -- a new unsupervised embedding model leveraging a network of two capsule layers. Caps2NE induces a routing process to aggregate feature vectors of context neighbors of a given target node at the first capsule layer, then feed these features into the second capsule layer to infer a plausible embedding for the target node. Experimental results show that our proposed Caps2NE obtains state-of-the-art performances on benchmark datasets for the node classification task.

@ARTICLE { nguyen_etal_cikm20_capsule, AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung }, JOURNAL = { Proc. of the 29th ACM Int. Conf. on Information and Knowledge Management (CIKM) }, TITLE = { A Capsule Network-based Model for Learning Node Embeddings }, YEAR = { 2020 }, NOTE = { Our code is available at: \url{https://github.com/daiquocnguyen/Caps2NE} }, VOLUME = { abs/1911.04822 }, ABSTRACT = { In this paper, we focus on learning low-dimensional embeddings for nodes in graph-structured data. To achieve this, we propose Caps2NE -- a new unsupervised embedding model leveraging a network of two capsule layers. Caps2NE induces a routing process to aggregate feature vectors of context neighbors of a given target node at the first capsule layer, then feed these features into the second capsule layer to infer a plausible embedding for the target node. Experimental results show that our proposed Caps2NE obtains state-of-the-art performances on benchmark datasets for the node classification task. }, FILE = { :nguyen_etal_cikm20_capsule - A Capsule Network Based Model for Learning Node Embeddings.pdf:PDF }, URL = { https://arxiv.org/abs/1911.04822 }, }

A Self-Attention Network based Node Embedding Model
Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. In Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2020. [ | ]
Despite several progresses have been made recently, limited research has been conducted for inductive setting where embeddings are required for newly unseen nodes – a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extraction. To this end, we propose SANNE – a novel unsupervised embedding model – whose central idea is to employ a self-attention mechanism followed by a feed-forward network, in order to iteratively aggregate vector representations of nodes in sampled random walks. As a consequence, SANNE can produce plausible embeddings not only for present nodes, but also for newly unseen nodes. Experimental results show that our unsupervised SANNE obtains state-of-the-art results for the node classification task on benchmark datasets.

@INPROCEEDINGS { nguyen_etal_ecml20_selfattention, AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung }, BOOKTITLE = { Proc. of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) }, TITLE = { A Self-Attention Network based Node Embedding Model }, YEAR = { 2020 }, ABSTRACT = { Despite several progresses have been made recently, limited research has been conducted for inductive setting where embeddings are required for newly unseen nodes – a setting encountered commonly in practical applications of deep learning for graph networks. This significantly affects the performances of downstream tasks such as node classification, link prediction or community extraction. To this end, we propose SANNE – a novel unsupervised embedding model – whose central idea is to employ a self-attention mechanism followed by a feed-forward network, in order to iteratively aggregate vector representations of nodes in sampled random walks. As a consequence, SANNE can produce plausible embeddings not only for present nodes, but also for newly unseen nodes. Experimental results show that our unsupervised SANNE obtains state-of-the-art results for the node classification task on benchmark datasets. }, }

Parameterized Rate-Distortion Stochastic Encoder
Quan Hoang, Trung Le and Dinh Phung. In Proc. of the 37th International Conference on Machine Learning (ICML), 2020. [ | ]

@INPROCEEDINGS { hoang_etal_icml20_parameterized, AUTHOR = { Quan Hoang and Trung Le and Dinh Phung }, BOOKTITLE = { Proc. of the 37th International Conference on Machine Learning (ICML) }, TITLE = { Parameterized Rate-Distortion Stochastic Encoder }, YEAR = { 2020 }, }

Deep Generative Models of Sparse and Overdispersed Discrete Data
He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung and Mingyuan Zhou. In Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS), 2020. [ | | pdf]
In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models.

@INPROCEEDINGS { zhao_etal_aistats20_deepgenerative, AUTHOR = { He Zhao and Piyush Rai and Lan Du and Wray Buntine and Dinh Phung and Mingyuan Zhou }, TITLE = { Deep Generative Models of Sparse and Overdispersed Discrete Data }, BOOKTITLE = { Proc of the 23rd Int. Conf. on Artificial Intelligence and Statistics (AISTATS) }, YEAR = { 2020 }, ABSTRACT = { In this paper, we propose a variational autoencoder based framework that generates discrete data, including both count-valued and binary data, via negativebinomial distribution. We also examine the model’s ability to capture self- and cross-excitations in discrete data, which are critical for modelling overdispersion. We conduct extensive experiments on text analysis and collaborative filtering. Compared with several state-of-the-art baselines, the proposed models achieve significantly better performance on the above problems. By achieving superior modelling performance with a simple yet effect Bayesian extension to VAEs, we demonstrate that it is feasible to adapt the knowledge and experience of Bayesian probabilistic matrix factorisation into newly-developed deep generative models. }, FILE = { :zhao_etal_aistats20_deepgenerative - Deep Generative Models of Sparse and Overdispersed Discrete Data.pdf:PDF }, URL = { https://www.semanticscholar.org/paper/Deep-Generative-Models-of-Sparse-and-Overdispersed-Zhao-Rai/8136c46488875b09e15e89c08bf02698901322a1 }, }

A Relational Memory-based Embedding Model for Triple Classification and Search Personalization
Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. In Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020. [ | | pdf]
Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task.

@INPROCEEDINGS { nguyen_etal_acl9_relational, AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung }, BOOKTITLE = { Proc. of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) }, TITLE = { A Relational Memory-based Embedding Model for Triple Classification and Search Personalization }, YEAR = { 2020 }, ABSTRACT = { Knowledge graph embedding methods often suffer from a limitation of memorizing valid triples to predict new ones for triple classification and search personalization problems. To this end, we introduce a novel embedding model, named R-MeN, that explores a relational memory network to encode potential dependencies in relationship triples. R-MeN considers each triple as a sequence of 3 input vectors that recurrently interact with a memory using a transformer self-attention mechanism. Thus R-MeN encodes new information from interactions between the memory and each input vector to return a corresponding vector. Consequently, R-MeN feeds these 3 returned vectors to a convolutional neural network-based decoder to produce a scalar score for the triple. Experimental results show that our proposed R-MeN obtains state-of-the-art results on SEARCH17 for the search personalization task, and on WN11 and FB13 for the triple classification task. }, FILE = { :nguyen_etal_acl9_relational - A Relational Memory Based Embedding Model for Triple Classification and Search Personalization.PDF:PDF }, URL = { https://arxiv.org/abs/1907.06080 }, }

Stein variational gradient descent with variance reduction
Nhan Dam, Trung Le, Viet Huynh and Dinh Phung. In Proc. of the 2020 Int. Joint Conference on Neural Networks (IJCNN), jul 2020. [ | ]
Probabilistic inference is a common and important task in statistical machine learning. The recently proposed Stein variational gradient descent (SVGD) is a generic Bayesian inference method that has been shown to be successfully applied in a wide range of contexts, especially in dealing with large datasets, where existing probabilistic inference methods have been known to be ineffective. In a large-scale data setting, SVGD employs the mini-batch strategy but its mini-batch estimator has large variance, hence compromising its estimation quality in practice. To this end, we propose in this paper a generic SVGD-based inference method that can significantly reduce the variance of mini-batch estimator when working with large datasets. Our experiments on 14 datasets show that the proposed method enjoys substantial and consistent improvements compared with baseline methods in binary classification task and its pseudo-online learning setting, and regression task. Furthermore, our framework is generic and applicable to a wide range of probabilistic inference problems such as in Bayesian neural networks and Markov random fields.

@INPROCEEDINGS { dam_etal_ijcnn20_steinvariational, AUTHOR = { Nhan Dam and Trung Le and Viet Huynh and Dinh Phung }, BOOKTITLE = { Proc. of the 2020 Int. Joint Conference on Neural Networks (IJCNN) }, TITLE = { Stein variational gradient descent with variance reduction }, YEAR = { 2020 }, MONTH = { jul }, ABSTRACT = { Probabilistic inference is a common and important task in statistical machine learning. The recently proposed Stein variational gradient descent (SVGD) is a generic Bayesian inference method that has been shown to be successfully applied in a wide range of contexts, especially in dealing with large datasets, where existing probabilistic inference methods have been known to be ineffective. In a large-scale data setting, SVGD employs the mini-batch strategy but its mini-batch estimator has large variance, hence compromising its estimation quality in practice. To this end, we propose in this paper a generic SVGD-based inference method that can significantly reduce the variance of mini-batch estimator when working with large datasets. Our experiments on 14 datasets show that the proposed method enjoys substantial and consistent improvements compared with baseline methods in binary classification task and its pseudo-online learning setting, and regression task. Furthermore, our framework is generic and applicable to a wide range of probabilistic inference problems such as in Bayesian neural networks and Markov random fields. }, FILE = { :dam_etal_ijcnn20_steinvariational - Stein Variational Gradient Descent with Variance Reduction.pdf:PDF }, }

OptiGAN: Generative Adversarial Networks for Goal Optimized Sequence Generation
Mahmoud Hossam, Trung Le, Viet Huynh, Michael Papasimeon and Dinh Phung. In Proc. of the International Joint Conference on Neural Networks (IJCNN), 2020. [ | ]
One of the challenging problems in sequence generation tasks is the optimized generation of sequences with specific desired goals. Existing sequential generative models mainly generate sequences to closely mimic the training data, without direct optimization according to desired goals or properties specific to the task. In this paper, we propose OptiGAN, a generative GAN-based model that incorporates both Generative Adversarial Networks and Reinforcement Learning (RL) to optimize desired goal scores using policy gradients. We apply our model to text and sequence generation, where our model is able to achieve higher scores out-performing selected GAN and RL baselines, while not sacrificing output sample diversity.

@INPROCEEDINGS { hossam_etal_ijcnn20_OptiGAN, AUTHOR = { Mahmoud Hossam and Trung Le and Viet Huynh and Michael Papasimeon and Dinh Phung }, BOOKTITLE = { Proc. of the International Joint Conference on Neural Networks (IJCNN) }, TITLE = { OptiGAN: Generative Adversarial Networks for Goal Optimized Sequence Generation }, YEAR = { 2020 }, ABSTRACT = { One of the challenging problems in sequence generation tasks is the optimized generation of sequences with specific desired goals. Existing sequential generative models mainly generate sequences to closely mimic the training data, without direct optimization according to desired goals or properties specific to the task. In this paper, we propose OptiGAN, a generative GAN-based model that incorporates both Generative Adversarial Networks and Reinforcement Learning (RL) to optimize desired goal scores using policy gradients. We apply our model to text and sequence generation, where our model is able to achieve higher scores out-performing selected GAN and RL baselines, while not sacrificing output sample diversity. }, FILE = { :hossam_etal_ijcnn20_OptiGAN - OptiGAN_ Generative Adversarial Networks for Goal Optimized Sequence Generation.pdf:PDF }, }

Code Pointer Network for Binary Function Scope Identification
Van Nguyen, Trung Le, Tue Le, Khanh Nguyen, Olivier de Vel, Paul Montague and Dinh Phung. In Proc. of the International Joint Conference on Neural Networks (IJCNN), 2020. [ | ]
Function identification is a preliminary step in binary analysis for many extensive applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Pointer Network that leverages the underlying idea of a pointer network to efficiently and effectively tackle function scope identification – the hardest and most crucial task in function identification. We establish extensive experiments to compare our proposed method with the deep learning based baseline. Experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art baseline in terms of both predictive performance and running time.

@INPROCEEDINGS { nguyen_etal_ijcnn20_codepointer, AUTHOR = { Van Nguyen and Trung Le and Tue Le and Khanh Nguyen and Olivier de Vel and Paul Montague and Dinh Phung }, BOOKTITLE = { Proc. of the International Joint Conference on Neural Networks (IJCNN) }, TITLE = { Code Pointer Network for Binary Function Scope Identification }, YEAR = { 2020 }, ABSTRACT = { Function identification is a preliminary step in binary analysis for many extensive applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Pointer Network that leverages the underlying idea of a pointer network to efficiently and effectively tackle function scope identification – the hardest and most crucial task in function identification. We establish extensive experiments to compare our proposed method with the deep learning based baseline. Experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art baseline in terms of both predictive performance and running time. }, FILE = { :nguyen_etal_ijcnn20_codepointer - Code Pointer Network for Binary Function Scope Identification.pdf:PDF }, }

Dual-Component Deep Domain Adaptation: A New Approach for Cross Project Software Vulnerability Detection
Van Nguyen, Trung Le, Olivier de Vel, Paul Montague, John Grundy and Dinh Phung. In Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2020. [ | ]
Owing to the ubiquity of computer software, software vulnerability detection (SVD) has become an important problem in the software industry and computer security. One of the most crucial issues in SVD is coping with the scarcity of labeled vulnerabilities in projects that require the laborious manual labeling of code by software security experts. One possible solution is to employ deep domain adaptation (DA) which has recently witnessed enormous success in transferring learning from structural labeled to unlabeled data sources. Generative adversarial network (GAN) is a technique that attempts to bridge the gap between source and target data in the joint space and emerges as a building block to develop deep DA approaches with state-of-the-art performance. However, deep DA approaches using the GAN principle to close the gap are subject to the mode collapsing problem that negatively impacts the predictive performance. Our aim in this paper is to propose Dual Generator-Discriminator Deep Code Domain Adaptation Network (Dual-GD-DDAN) for tackling the problem of transfer learning from labeled to unlabeled software projects in SVD to resolve the mode collapsing problem faced in previous approaches. The experimental results on real-world software projects show that our method outperforms state-of-the art baselines by a wide margin.

@INPROCEEDINGS { nguyen_etal_pakdd20_dualcomponent, AUTHOR = { Van Nguyen and Trung Le and Olivier de Vel and Paul Montague and John Grundy and Dinh Phung }, BOOKTITLE = { Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, TITLE = { Dual-Component Deep Domain Adaptation: A New Approach for Cross Project Software Vulnerability Detection }, YEAR = { 2020 }, ABSTRACT = { Owing to the ubiquity of computer software, software vulnerability detection (SVD) has become an important problem in the software industry and computer security. One of the most crucial issues in SVD is coping with the scarcity of labeled vulnerabilities in projects that require the laborious manual labeling of code by software security experts. One possible solution is to employ deep domain adaptation (DA) which has recently witnessed enormous success in transferring learning from structural labeled to unlabeled data sources. Generative adversarial network (GAN) is a technique that attempts to bridge the gap between source and target data in the joint space and emerges as a building block to develop deep DA approaches with state-of-the-art performance. However, deep DA approaches using the GAN principle to close the gap are subject to the mode collapsing problem that negatively impacts the predictive performance. Our aim in this paper is to propose Dual Generator-Discriminator Deep Code Domain Adaptation Network (Dual-GD-DDAN) for tackling the problem of transfer learning from labeled to unlabeled software projects in SVD to resolve the mode collapsing problem faced in previous approaches. The experimental results on real-world software projects show that our method outperforms state-of-the art baselines by a wide margin. }, FILE = { :nguyen_etal_pakdd20_dualcomponent - Dual Component Deep Domain Adaptation_ a New Approach for Cross Project Software Vulnerability Detection.pdf:PDF }, }

Code Action Network for Binary Function Scope Identification
Van Nguyen, Trung Le, Tue Le, Khanh Nguyen, Olivier de Vel, Paul Montague, John Grundy and Dinh Phung. In Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2020. [ | ]
Function identification is a preliminary step in binary analysis for many applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Action Network (CAN) whose key idea is to encode the task of function scope identification to a sequence of three action states NI (i.e., next inclusion), NE (i.e., next exclusion), and FE (i.e., function end) to efficiently and effectively tackle function scope identification, the hardest and most crucial task in function identification. A bidirectional Recurrent Neural Network is trained to match binary programs with their sequence of action states. To work out function scopes in a binary, this binary is first fed to a trained CAN to output its sequence of action states which can be further decoded to know the function scopes in the binary. We undertake extensive experiments to compare our proposed method with other stateof-the-art baselines. Experimental results demonstrate that our proposed method outperforms the state-of-the-art baselines in terms of predictive performance on real-world datasets which include binaries from well-known libraries.

@INPROCEEDINGS { nguyen_etal_pakdd20_codeaction, AUTHOR = { Van Nguyen and Trung Le and Tue Le and Khanh Nguyen and Olivier de Vel and Paul Montague and John Grundy and Dinh Phung }, BOOKTITLE = { Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, TITLE = { Code Action Network for Binary Function Scope Identification }, YEAR = { 2020 }, ABSTRACT = { Function identification is a preliminary step in binary analysis for many applications from malware detection, common vulnerability detection and binary instrumentation to name a few. In this paper, we propose the Code Action Network (CAN) whose key idea is to encode the task of function scope identification to a sequence of three action states NI (i.e., next inclusion), NE (i.e., next exclusion), and FE (i.e., function end) to efficiently and effectively tackle function scope identification, the hardest and most crucial task in function identification. A bidirectional Recurrent Neural Network is trained to match binary programs with their sequence of action states. To work out function scopes in a binary, this binary is first fed to a trained CAN to output its sequence of action states which can be further decoded to know the function scopes in the binary. We undertake extensive experiments to compare our proposed method with other stateof-the-art baselines. Experimental results demonstrate that our proposed method outperforms the state-of-the-art baselines in terms of predictive performance on real-world datasets which include binaries from well-known libraries. }, FILE = { :nguyen_etal_pakdd20_codeaction - Code Action Network for Binary Function Scope Identification.pdf:PDF }, }

Deep Cost-sensitive Kernel Machine for Binary Software Vulnerability Detection
Tuan Nguyen, Trung Le, Khanh Nguyen, Olivier de Vel, Paul Montague, John C Grundy and and Dinh Phung. In Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2020. [ | ]
Owing to the sharp rise in the severity of the threats imposed by software vulnerabilities, software vulnerability detection has become an important concern in the software industry, such as the embedded systems industry, and in the field of computer security. Software vulnerability detection can be carried out at the source code or binary level. However, the latter is more impactful and practical since when using commercial software, we usually only possess binary software. In this paper, we leverage deep learning and kernel methods to propose the Deep Cost-sensitive Kernel Machine, a method that inherits the advantages of deep learning methods in efficiently tackling structural data and kernel methods in learning the characteristic of vulnerable binary examples with high generalization capacity. We conduct experiments on two real-world binary datasets. The experimental results have shown a convincing outperformance of our proposed method over the baselines.

@INPROCEEDINGS { nguyen_etal_pakdd20_deepcost, AUTHOR = { Tuan Nguyen and Trung Le and Khanh Nguyen and Olivier de Vel and Paul Montague and John C Grundy and and Dinh Phung }, BOOKTITLE = { Proc. of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, TITLE = { Deep Cost-sensitive Kernel Machine for Binary Software Vulnerability Detection }, YEAR = { 2020 }, ABSTRACT = { Owing to the sharp rise in the severity of the threats imposed by software vulnerabilities, software vulnerability detection has become an important concern in the software industry, such as the embedded systems industry, and in the field of computer security. Software vulnerability detection can be carried out at the source code or binary level. However, the latter is more impactful and practical since when using commercial software, we usually only possess binary software. In this paper, we leverage deep learning and kernel methods to propose the Deep Cost-sensitive Kernel Machine, a method that inherits the advantages of deep learning methods in efficiently tackling structural data and kernel methods in learning the characteristic of vulnerable binary examples with high generalization capacity. We conduct experiments on two real-world binary datasets. The experimental results have shown a convincing outperformance of our proposed method over the baselines. }, FILE = { :nguyen_etal_pakdd20_deepcost - Deep Cost Sensitive Kernel Machine for Binary Software Vulnerability Detection.pdf:PDF }, }

Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models
Vu Thuy-Trang , Phung Dinh and Haffari Gholamreza. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6163-6173, Online, nov 2020. [ | | pdf]
Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest. Current self-supervised adaptation methods are simplistic, as the training signal comes from a small percentage of \textit{randomly} masked-out tokens. In this paper, we show that careful masking strategies can bridge the knowledge gap of masked language models (MLMs) about the domains more effectively by allocating self-supervision where it is needed. Furthermore, we propose an effective training strategy by adversarially masking out those tokens which are harder to reconstruct by the underlying MLM. The adversarial objective leads to a challenging combinatorial optimisation problem over \textit{subsets} of tokens, which we tackle efficiently through relaxation to a variational lowerbound and dynamic programming. On six unsupervised domain adaptation tasks involving named entity recognition, our method strongly outperforms the random masking strategy and achieves up to +1.64 F1 score improvements.

@INPROCEEDINGS { vu_etal_emnlp20_effective, TITLE = { Effective Unsupervised Domain Adaptation with Adversarially Trained Language Models }, AUTHOR = { Vu Thuy-Trang and Phung Dinh and Haffari Gholamreza }, BOOKTITLE = { Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) }, MONTH = { nov }, YEAR = { 2020 }, ADDRESS = { Online }, PUBLISHER = { Association for Computational Linguistics }, URL = { https://aclanthology.org/2020.emnlp-main.497 }, DOI = { 10.18653/v1/2020.emnlp-main.497 }, PAGES = { 6163--6173 }, ABSTRACT = { Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest. Current self-supervised adaptation methods are simplistic, as the training signal comes from a small percentage of \textit{randomly} masked-out tokens. In this paper, we show that careful masking strategies can bridge the knowledge gap of masked language models (MLMs) about the domains more effectively by allocating self-supervision where it is needed. Furthermore, we propose an effective training strategy by adversarially masking out those tokens which are harder to reconstruct by the underlying MLM. The adversarial objective leads to a challenging combinatorial optimisation problem over \textit{subsets} of tokens, which we tackle efficiently through relaxation to a variational lowerbound and dynamic programming. On six unsupervised domain adaptation tasks involving named entity recognition, our method strongly outperforms the random masking strategy and achieves up to +1.64 F1 score improvements. }, }

Improving Adversarial Robustness by Enforcing Local and Global Compactness
Anh Bui, Trung Le, He Zhao, P. Montague, O. de Vel, T. Abraham and Dinh Phung. In Proc. of the European Conference on Computer Vision (ECCV), 2020. [ | ]
The fact that deep neural networks are susceptible to crafted perturbations severely impacts the use of deep learning in certain domains of application. Among many developed defense models against such attacks, adversarial training emerges as the most successful method that consistently resists a wide range of attacks. In this work, based on an observation from a previous study that the representations of a clean data example and its adversarial examples become more divergent in higher layers of a deep neural net, we propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption over an intermediate layer of a deep neural network. We conduct comprehensive experiments to understand the isolating behavior of each component (i.e., local/global compactness and the clustering assumption) and compare our proposed model with state-of-the-art adversarial training methods. The experimental results demonstrate that augmenting adversarial training with our proposed components can further improve the robustness of the network, leading to higher unperturbed and adversarial predictive performances.

@INPROCEEDINGS { bui_etal_eccv20_improving, AUTHOR = { Anh Bui and Trung Le and He Zhao and P. Montague and O. de Vel and T. Abraham and Dinh Phung }, BOOKTITLE = { Proc. of the European Conference on Computer Vision (ECCV) }, TITLE = { Improving Adversarial Robustness by Enforcing Local and Global Compactness }, YEAR = { 2020 }, ABSTRACT = { The fact that deep neural networks are susceptible to crafted perturbations severely impacts the use of deep learning in certain domains of application. Among many developed defense models against such attacks, adversarial training emerges as the most successful method that consistently resists a wide range of attacks. In this work, based on an observation from a previous study that the representations of a clean data example and its adversarial examples become more divergent in higher layers of a deep neural net, we propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption over an intermediate layer of a deep neural network. We conduct comprehensive experiments to understand the isolating behavior of each component (i.e., local/global compactness and the clustering assumption) and compare our proposed model with state-of-the-art adversarial training methods. The experimental results demonstrate that augmenting adversarial training with our proposed components can further improve the robustness of the network, leading to higher unperturbed and adversarial predictive performances. }, }

Using spatiotemporal distribution of geocoded Twitter data to predict US county-level health indices
Thin Nguyen, Mark Larsen, Bridianne O’Dea, Hung Nguyen, Duc Thanh Nguyen, John Yearwood, Dinh Phung, Svetha Venkatesh and Helen Christensen. Future Generation Computer Systems, 110:620-628, 2020. [ | | pdf]
For more than three decades, the US has annually conducted Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture health behavior and health status of its people. Though this kind of information at population level is important for local governments to identify local needs, traditional datasets take several years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. Due to the large scale of data, such as approximately two billions of tweets in this work, aggregating the tweets at a population level is common practice. While alleviating the computational cost, the aggregation operation would result in the loss of information on the distribution of data over the population, and such information may be important for identifying the health behavior and health outcomes of the population. In this work, we propose statistical features constructed on-top of primary features to predict county-level health indices. The primary features include topics and linguistic patterns extracted from tweets with county-decoded information. In addition, tweeting behaviors, particularly tweeting time, are used as a predictor of the health indices. Apache Spark, an advanced cluster computing paradigm, was employed to efficiently process the large corpus of tweets, including geo-decoding the geotags, extracting low-level (primary) features, and computing the statistical features. The results show strong correlations between publicly available health indices and the features extracted from geospatially coded Twitter data. Statistical features gained higher correlation coefficients than did the aggregation ones, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels. In addition, the prediction performance was also improved when the temporal information was employed. This demonstrates that the real-time analysis of social media data can provide timely insights into the health of populations.

@ARTICLE { thin_etal_fgcs20_using_spatiotemporal, TITLE = { Using spatiotemporal distribution of geocoded Twitter data to predict US county-level health indices }, JOURNAL = { Future Generation Computer Systems }, VOLUME = { 110 }, PAGES = { 620-628 }, YEAR = { 2020 }, ISSN = { 0167-739X }, DOI = { https://doi.org/10.1016/j.future.2018.01.014 }, URL = { https://www.sciencedirect.com/science/article/pii/S0167739X17312487 }, AUTHOR = { Thin Nguyen and Mark Larsen and Bridianne O’Dea and Hung Nguyen and Duc Thanh Nguyen and John Yearwood and Dinh Phung and Svetha Venkatesh and Helen Christensen }, KEYWORDS = { Mining spatial and temporal data, Statistical features, Spatio-temporal features, Cluster computing, Large-scale parallel and distributed implementation, Apache Spark }, ABSTRACT = { For more than three decades, the US has annually conducted Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture health behavior and health status of its people. Though this kind of information at population level is important for local governments to identify local needs, traditional datasets take several years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. Due to the large scale of data, such as approximately two billions of tweets in this work, aggregating the tweets at a population level is common practice. While alleviating the computational cost, the aggregation operation would result in the loss of information on the distribution of data over the population, and such information may be important for identifying the health behavior and health outcomes of the population. In this work, we propose statistical features constructed on-top of primary features to predict county-level health indices. The primary features include topics and linguistic patterns extracted from tweets with county-decoded information. In addition, tweeting behaviors, particularly tweeting time, are used as a predictor of the health indices. Apache Spark, an advanced cluster computing paradigm, was employed to efficiently process the large corpus of tweets, including geo-decoding the geotags, extracting low-level (primary) features, and computing the statistical features. The results show strong correlations between publicly available health indices and the features extracted from geospatially coded Twitter data. Statistical features gained higher correlation coefficients than did the aggregation ones, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels. In addition, the prediction performance was also improved when the temporal information was employed. This demonstrates that the real-time analysis of social media data can provide timely insights into the health of populations. }, }

Variational Autoencoders for Sparse and Overdispersed Discrete Data
Zhao, He, Rai, Piyush, Du, Lan, Buntine, Wray, Phung, Dinh and Zhou, Mingyuan. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, pages 1684-1694, 26--28 Aug 2020. [ | | pdf]
Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count or binary) data. Recent deep probabilistic models based on variational autoencoders (VAE) have shown promising results on discrete data but may have inferior modelling performance due to the insufficient capability in modelling overdispersion and model misspecification. To address these issues, we develop a VAE-based framework using the negative binomial distribution as the data distribution. We also provide an analysis of its properties vis-à-vis other models. We conduct extensive experiments on three problems from discrete data analysis: text analysis/topic modelling, collaborative filtering, and multi-label learning. Our models outperform state-of-the-art approaches on these problems, while also capturing the phenomenon of overdispersion more effectively.

@INPROCEEDINGS { zhao_etal_pmlr20_variational_autoencoders, TITLE = { Variational Autoencoders for Sparse and Overdispersed Discrete Data }, AUTHOR = { Zhao, He and Rai, Piyush and Du, Lan and Buntine, Wray and Phung, Dinh and Zhou, Mingyuan }, BOOKTITLE = { Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics }, PAGES = { 1684--1694 }, YEAR = { 2020 }, EDITOR = { Chiappa, Silvia and Calandra, Roberto }, VOLUME = { 108 }, SERIES = { Proceedings of Machine Learning Research }, MONTH = { 26--28 Aug }, PUBLISHER = { PMLR }, PDF = { http://proceedings.mlr.press/v108/zhao20c/zhao20c.pdf }, URL = { https://proceedings.mlr.press/v108/zhao20c.html }, ABSTRACT = { Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count or binary) data. Recent deep probabilistic models based on variational autoencoders (VAE) have shown promising results on discrete data but may have inferior modelling performance due to the insufficient capability in modelling overdispersion and model misspecification. To address these issues, we develop a VAE-based framework using the negative binomial distribution as the data distribution. We also provide an analysis of its properties vis-à-vis other models. We conduct extensive experiments on three problems from discrete data analysis: text analysis/topic modelling, collaborative filtering, and multi-label learning. Our models outperform state-of-the-art approaches on these problems, while also capturing the phenomenon of overdispersion more effectively. }, }

Pair-Based Uncertainty and Diversity Promoting Early Active Learning for Person Re-Identification
Liu, Wenhe, Chang, Xiaojun, Chen, Ling, Phung, Dinh, Zhang, Xiaoqin, Yang, Yi and Hauptmann, Alexander G.. ACM Trans. Intell. Syst. Technol., 11(2), jan 2020. [ | | pdf]
The effective training of supervised Person Re-identification (Re-ID) models requires sufficient pairwise labeled data. However, when there is limited annotation resource, it is difficult to collect pairwise labeled data. We consider a challenging and practical problem called Early Active Learning, which is applied to the early stage of experiments when there is no pre-labeled sample available as references for human annotating. Previous early active learning methods suffer from two limitations for Re-ID. First, these instance-based algorithms select instances rather than pairs, which can result in missing optimal pairs for Re-ID. Second, most of these methods only consider the representativeness of instances, which can result in selecting less diverse and less informative pairs. To overcome these limitations, we propose a novel pair-based active learning for Re-ID. Our algorithm selects pairs instead of instances from the entire dataset for annotation. Besides representativeness, we further take into account the uncertainty and the diversity in terms of pairwise relations. Therefore, our algorithm can produce the most representative, informative, and diverse pairs for Re-ID data annotation. Extensive experimental results on five benchmark Re-ID datasets have demonstrated the superiority of the proposed pair-based early active learning algorithm.

@ARTICLE { liu_etal_acmTIST20_pair_based_uncertainty, AUTHOR = { Liu, Wenhe and Chang, Xiaojun and Chen, Ling and Phung, Dinh and Zhang, Xiaoqin and Yang, Yi and Hauptmann, Alexander G. }, TITLE = { Pair-Based Uncertainty and Diversity Promoting Early Active Learning for Person Re-Identification }, YEAR = { 2020 }, ISSUE_DATE = { April 2020 }, PUBLISHER = { Association for Computing Machinery }, ADDRESS = { New York, NY, USA }, VOLUME = { 11 }, NUMBER = { 2 }, ISSN = { 2157-6904 }, URL = { https://doi.org/10.1145/3372121 }, DOI = { 10.1145/3372121 }, ABSTRACT = { The effective training of supervised Person Re-identification (Re-ID) models requires sufficient pairwise labeled data. However, when there is limited annotation resource, it is difficult to collect pairwise labeled data. We consider a challenging and practical problem called Early Active Learning, which is applied to the early stage of experiments when there is no pre-labeled sample available as references for human annotating. Previous early active learning methods suffer from two limitations for Re-ID. First, these instance-based algorithms select instances rather than pairs, which can result in missing optimal pairs for Re-ID. Second, most of these methods only consider the representativeness of instances, which can result in selecting less diverse and less informative pairs. To overcome these limitations, we propose a novel pair-based active learning for Re-ID. Our algorithm selects pairs instead of instances from the entire dataset for annotation. Besides representativeness, we further take into account the uncertainty and the diversity in terms of pairwise relations. Therefore, our algorithm can produce the most representative, informative, and diverse pairs for Re-ID data annotation. Extensive experimental results on five benchmark Re-ID datasets have demonstrated the superiority of the proposed pair-based early active learning algorithm. }, JOURNAL = { ACM Trans. Intell. Syst. Technol. }, MONTH = { jan }, ARTICLENO = { 21 }, NUMPAGES = { 15 }, KEYWORDS = { Active learning, person re-identification }, }

2019

A Bayesian Extension to VAEs for Discrete Data
He Zhao, Piyush Rai, Lan Du, Wray Buntine, Dinh Phung and Mingyuan Zhou. In In Proc. of Bayesian Deep Learning (NeurIPS 2019 Workshop), dec 2019. [ | ]

@INPROCEEDINGS { zhao_etal_bdl19_bayesianextension, AUTHOR = { He Zhao and Piyush Rai and Lan Du and Wray Buntine and Dinh Phung and Mingyuan Zhou }, TITLE = { A Bayesian Extension to {VAE}s for Discrete Data }, BOOKTITLE = { In Proc. of Bayesian Deep Learning (NeurIPS 2019 Workshop) }, YEAR = { 2019 }, MONTH = { dec }, }

Pair-based Uncertainty and Diversity Promoting Early Active Learning for Person Re-identification
Wenhe Liu, Xiaojun Chang, Ling Chen, Dinh Phung, Yang Yi and Hauptmann Alexander. Transactions on Intelligent Systems and Technology, 2019. [ | ]

@ARTICLE { liu_etal_tist19_pairbased, AUTHOR = { Wenhe Liu and Xiaojun Chang and Ling Chen and Dinh Phung and Yang Yi and Hauptmann Alexander }, TITLE = { Pair-based Uncertainty and Diversity Promoting Early Active Learning for Person Re-identification }, JOURNAL = { Transactions on Intelligent Systems and Technology }, YEAR = { 2019 }, }

An effective spatial-temporal attention based neural network for traffic flow prediction
Loan N.N. Do, Hai L. Vu, Bao Q. Vo, Zhiyuan Liu and Dinh Phung. Transportation Research Part C: Emerging Technologies, 108:12 - 28, 2019. [ | | pdf]
Due to its importance in Intelligent Transport Systems (ITS), traffic flow prediction has been the focus of many studies in the last few decades. Existing traffic flow prediction models mainly extract static spatial-temporal correlations, although these correlations are known to be dynamic in traffic networks. Attention-based models have emerged in recent years, mostly in the field of natural language processing, and have resulted in major progresses in terms of both accuracy and interpretability. This inspires us to introduce the application of attentions for traffic flow prediction. In this study, a deep learning based traffic flow predictor with spatial and temporal attentions (STANN) is proposed. The spatial and temporal attentions are used to exploit the spatial dependencies between road segments and temporal dependencies between time steps respectively. Experiment results with a real-world traffic dataset demonstrate the superior performance of the proposed model. The results also show that the utilization of multiple data resolutions could help improve prediction accuracy. Furthermore, the proposed model is demonstrated to have potential for improving the understanding of spatial-temporal correlations in a traffic network.

@ARTICLE { do_etal_trc19_AnEffective, AUTHOR = { Loan N.N. Do and Hai L. Vu and Bao Q. Vo and Zhiyuan Liu and Dinh Phung }, TITLE = { An effective spatial-temporal attention based neural network for traffic flow prediction }, JOURNAL = { Transportation Research Part C: Emerging Technologies }, YEAR = { 2019 }, VOLUME = { 108 }, PAGES = { 12 - 28 }, ISSN = { 0968-090X }, ABSTRACT = { Due to its importance in Intelligent Transport Systems (ITS), traffic flow prediction has been the focus of many studies in the last few decades. Existing traffic flow prediction models mainly extract static spatial-temporal correlations, although these correlations are known to be dynamic in traffic networks. Attention-based models have emerged in recent years, mostly in the field of natural language processing, and have resulted in major progresses in terms of both accuracy and interpretability. This inspires us to introduce the application of attentions for traffic flow prediction. In this study, a deep learning based traffic flow predictor with spatial and temporal attentions (STANN) is proposed. The spatial and temporal attentions are used to exploit the spatial dependencies between road segments and temporal dependencies between time steps respectively. Experiment results with a real-world traffic dataset demonstrate the superior performance of the proposed model. The results also show that the utilization of multiple data resolutions could help improve prediction accuracy. Furthermore, the proposed model is demonstrated to have potential for improving the understanding of spatial-temporal correlations in a traffic network. }, DOI = { https://doi.org/10.1016/j.trc.2019.09.008 }, FILE = { :do_etal_trc19_AnEffective - An Effective Spatial Temporal Attention Based Neural Network for Traffic Flow Prediction.pdf:PDF }, KEYWORDS = { Traffic flow prediction, Traffic flow forecasting, Deep learning, Neural network, Attention }, URL = { http://www.sciencedirect.com/science/article/pii/S0968090X19301330 }, }

Learning Generative Adversarial Networks from Multiple Data Sources
Trung Le, Quan Hoang, Hung Vu, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), pages 2823-2829, July 2019. [ | | pdf]
Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair.

@INPROCEEDINGS { le_etal_ijcai19_learningGAN, AUTHOR = { Trung Le and Quan Hoang and Hung Vu and Tu Dinh Nguyen and Hung Bui and Dinh Phung }, TITLE = { Learning Generative Adversarial Networks from Multiple Data Sources }, BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) }, YEAR = { 2019 }, PAGES = { 2823--2829 }, MONTH = { July }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, ABSTRACT = { Generative Adversarial Networks (GANs) are a powerful class of deep generative models. In this paper, we extend GAN to the problem of generating data that are not only close to a primary data source but also required to be different from auxiliary data sources. For this problem, we enrich both GANs’ formulations and applications by introducing pushing forces that thrust generated samples away from given auxiliary data sources. We term our method Push-and-Pull GAN (P2GAN). We conduct extensive experiments to demonstratethe merit of P2GAN in two applications: generating data with constraints and addressing the mode collapsing problem. We use CIFAR-10, STL-10, and ImageNet datasets and compute Fréchet Inception Distance to evaluate P2GAN’s effectiveness in addressing the mode collapsing problem. The results show that P2GAN outperforms the state-of-the-art baselines. For the problem of generating data with constraints, we show that P2GAN can successfully avoid generating specific features such as black hair. }, FILE = { :le_etal_ijcai19_learningGAN - Learning Generative Adversarial Networks from Multiple Data Sources.pdf:PDF }, URL = { https://www.ijcai.org/Proceedings/2019/391 }, }

Three-Player Wasserstein GAN via Amortised Duality
Nhan Dam, Quan Hoang, Trung Le, Tu Dinh Nguyen, Hung Bui and Dinh Phung. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI), pages 2202-2208, July 2019. [ | | pdf]
We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method.

@INPROCEEDINGS { dam_etal_ijcai19_3pwgan, AUTHOR = { Nhan Dam and Quan Hoang and Trung Le and Tu Dinh Nguyen and Hung Bui and Dinh Phung }, BOOKTITLE = { Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (IJCAI) }, TITLE = { Three-Player {W}asserstein {GAN} via Amortised Duality }, YEAR = { 2019 }, MONTH = { July }, PAGES = { 2202--2208 }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, ABSTRACT = { We propose a new formulation for learning generative adversarial networks (GANs) using optimal transport cost (the general form of Wasserstein distance) as the objective criterion to measure the dissimilarity between target distribution and learned distribution. Our formulation is based on the general form of the Kantorovich duality which is applicable to optimal transport with a wide range of cost functions that are not necessarily a metric. To make optimising this duality form amenable to gradient-based methods, we employ a function that acts as an amortised optimiser for the innermost optimisation problem. Interestingly, the amortised optimiser can be viewed as a mover since it strategically shifts around data points. The resulting formulation is a sequential min-max-min game with 3 players: the generator, the critic, and the mover where the new player, the mover, attempts to fool the critic by shifting the data around. Despite involving three players, we demonstrate that our proposed formulation can be solved reasonably effectively via a simple alternative gradient learning strategy. Compared with the existing Lipschitz-constrained formulations of Wasserstein GAN on CIFAR-10, our model yields significantly better diversity scores than weight clipping and comparable performance to gradient penalty method. }, FILE = { :dam_etal_ijcai19_3pwgan - Three Player Wasserstein GAN Via Amortised Duality.pdf:PDF }, URL = { https://www.ijcai.org/Proceedings/2019/305 }, }

Learning How to Active Learn by Dreaming
Thuy-Trang Vu, Ming Liu, Dinh Phung and Gholamreza Haffari. In In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, jul 2019. [ | ]

@INPROCEEDINGS { vu_etal_acl19_learning, AUTHOR = { Thuy-Trang Vu and Ming Liu and Dinh Phung and Gholamreza Haffari }, TITLE = { Learning How to Active Learn by Dreaming }, BOOKTITLE = { In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL) }, YEAR = { 2019 }, ADDRESS = { Florence, Italy }, MONTH = { jul }, }

Deep Domain Adaptation for Vulnerable Code Function Identification
Van Nguyen, Trung Le, Tue Le, Khanh Nguyen, Olivier DeVel, Paul Montague, Lizhen Qu and Dinh Phung. In Int. Joint Conf. on Neural Networks (IJCNN), 2019. [ | ]
Due to the ubiquity of computer software, software vulnerability detection (SVD) has become crucial in the software industry and in the field of computer security. Two significant issues in SVD arise when using machine learning, namely: i) how to learn automatic features that can help improve the predictive performance of vulnerability detection and ii) how to overcome the scarcity of labeled vulnerabilities in projects that require the laborious labeling of code by software security experts. In this paper, we address these two crucial concerns by proposing a novel architecture which leverages deep domain adaptation with automatic feature learning for software vulnerability identification. Based on this architecture, we keep the principles and reapply the state-of-the-art deep domain adaptation methods to indicate that deep domain adaptation for SVD is plausible and promising. Moreover, we further propose a novel method named Semi-supervised Code Domain Adaptation Network (SCDAN) that can efficiently utilize and exploit information carried in unlabeled target data by considering them as the unlabeled portion in a semi-supervised learning context. The proposed SCDAN method enforces the clustering assumption, which is a key principle in semi-supervised learning. The experimental results using six real-world software project datasets show that our SCDAN method and the baselines using our architecture have better predictive performance by a wide margin compared with the Deep Code Network (VulDeePecker) method without domain adaptation. Also, the proposed SCDAN significantly outperforms the DIRT-T which to the best of our knowledge is currently the-state-of-the-art method in deep domain adaptation and other baselines.

@INPROCEEDINGS { van_etal_ijcnn19_deepdomain, AUTHOR = { Van Nguyen and Trung Le and Tue Le and Khanh Nguyen and Olivier DeVel and Paul Montague and Lizhen Qu and Dinh Phung }, TITLE = { Deep Domain Adaptation for Vulnerable Code Function Identification }, BOOKTITLE = { Int. Joint Conf. on Neural Networks (IJCNN) }, YEAR = { 2019 }, ABSTRACT = { Due to the ubiquity of computer software, software vulnerability detection (SVD) has become crucial in the software industry and in the field of computer security. Two significant issues in SVD arise when using machine learning, namely: i) how to learn automatic features that can help improve the predictive performance of vulnerability detection and ii) how to overcome the scarcity of labeled vulnerabilities in projects that require the laborious labeling of code by software security experts. In this paper, we address these two crucial concerns by proposing a novel architecture which leverages deep domain adaptation with automatic feature learning for software vulnerability identification. Based on this architecture, we keep the principles and reapply the state-of-the-art deep domain adaptation methods to indicate that deep domain adaptation for SVD is plausible and promising. Moreover, we further propose a novel method named Semi-supervised Code Domain Adaptation Network (SCDAN) that can efficiently utilize and exploit information carried in unlabeled target data by considering them as the unlabeled portion in a semi-supervised learning context. The proposed SCDAN method enforces the clustering assumption, which is a key principle in semi-supervised learning. The experimental results using six real-world software project datasets show that our SCDAN method and the baselines using our architecture have better predictive performance by a wide margin compared with the Deep Code Network (VulDeePecker) method without domain adaptation. Also, the proposed SCDAN significantly outperforms the DIRT-T which to the best of our knowledge is currently the-state-of-the-art method in deep domain adaptation and other baselines. }, FILE = { :van_etal_ijcnn19_deepdomain - Deep Domain Adaptation for Vulnerable Code Function Identification.pdf:PDF }, }

A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization
Dai Quoc Nguyen, Thanh Vu, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, USA, jun 2019. [ | | pdf]
In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset.

@INPROCEEDINGS { nguyen_etal_naaclhtl19_acapsule, AUTHOR = { Dai Quoc Nguyen and Thanh Vu and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung }, TITLE = { A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization }, BOOKTITLE = { In Proc. of Annual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL) }, YEAR = { 2019 }, ADDRESS = { Minneapolis, USA }, MONTH = { jun }, ABSTRACT = { In this paper, we introduce an embedding model, named CapsE, exploring a capsule network to model relationship triples (subject, relation, object). Our CapsE represents each triple as a 3-column matrix where each column vector represents the embedding of an element in the triple. This 3-column matrix is then fed to a convolution layer where multiple filters are operated to generate different feature maps. These feature maps are used to construct capsules in the first capsule layer. Capsule layers are connected via dynamic routing mechanism. The last capsule layer consists of only one capsule to produce a vector output. The length of this vector output is used to measure the plausibility of the triple. Our proposed CapsE obtains state-of-the-art link prediction results for knowledge graph completion on two benchmark datasets: WN18RR and FB15k-237, and outperforms strong search personalization baselines on SEARCH17 dataset. }, FILE = { :nguyen_etal_naaclhtl19_acapsule - A Capsule Network Based Embedding Model for Knowledge Graph Completion and Search Personalization.pdf:PDF }, URL = { https://arxiv.org/abs/1808.04122 }, }

Probabilistic Multilevel Clustering via Composite Transportation Distance
Viet Huynh, Nhat Ho, Dinh Phung and Michael I. Jordan. In In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS), Okinawa, Japan, apr 2019. [ | | pdf]
We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach.

@INPROCEEDINGS { ho_etal_aistats19_probabilistic, AUTHOR = { Viet Huynh and Nhat Ho and Dinh Phung and Michael I. Jordan }, TITLE = { Probabilistic Multilevel Clustering via Composite Transportation Distance }, BOOKTITLE = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) }, YEAR = { 2019 }, ADDRESS = { Okinawa, Japan }, MONTH = { apr }, ABSTRACT = { We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach. }, FILE = { :ho_etal_aistats19_probabilistic - Probabilistic Multilevel Clustering Via Composite Transportation Distance.pdf:PDF }, JOURNAL = { In Proc. of Int. Conf. on Artificial Intelligence and Statistics (AISTATS) }, URL = { https://arxiv.org/abs/1810.11911 }, }

Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection
Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel and Lizhen Qu. In International Conference on Learning Representations (ICLR), 2019. [ | | pdf]

@INPROCEEDINGS { le_etal_iclr18_maximal, AUTHOR = { Tue Le and Tuan Nguyen and Trung Le and Dinh Phung and Paul Montague and Olivier De Vel and Lizhen Qu }, TITLE = { Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection }, BOOKTITLE = { International Conference on Learning Representations (ICLR) }, YEAR = { 2019 }, FILE = { :le_etal_iclr18_maximal - Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection.pdf:PDF }, URL = { https://openreview.net/forum?id=ByloIiCqYQ }, }

Robust Anomaly Detection in Videos using Multilevel Representations
Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, USA, 2019. [ | | pdf]

@INPROCEEDINGS { vu_etal_aaai19_robustanomaly, AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung }, TITLE = { Robust Anomaly Detection in Videos using Multilevel Representations }, BOOKTITLE = { In Proceedings of Thirty-third AAAI Conference on Artificial Intelligence (AAAI) }, YEAR = { 2019 }, ADDRESS = { Honolulu, USA }, FILE = { :vu_etal_aaai19_robustanomaly - Robust Anomaly Detection in Videos Using Multilevel Representations.pdf:PDF }, GROUPS = { Anomaly Detection }, URL = { https://github.com/SeaOtter/vad_gan }, }

Usefulness of Wearable Cameras as a Tool to Enhance Chronic Disease Self-Management: Scoping Review
Ralph Maddison, Susie Cartledge, Michelle Rogerson, Nicole Sylvia Goedhart, Tarveen Ragbir Singh, Christopher Neil, Dinh Phung and Kylie Ball. JMIR mHealth and uHealth, 7(1):e10371, Jan 2019. [ | | pdf]
Background: Self-management is a critical component of chronic disease management and can include a host of activities, such as adhering to prescribed medications, undertaking daily care activities, managing dietary intake and body weight, and proactively contacting medical practitioners. The rise of technologies (mobile phones, wearable cameras) for health care use offers potential support for people to better manage their disease in collaboration with their treating health professionals. Wearable cameras can be used to provide rich contextual data and insight into everyday activities and aid in recall. This information can then be used to prompt memory recall or guide the development of interventions to support self-management. Application of wearable cameras to better understand and augment self-management by people with chronic disease has yet to be investigated. Objective: The objective of our review was to ascertain the scope of the literature on the use of wearable cameras for self-management by people with chronic disease and to determine the potential of wearable cameras to assist people to better manage their disease. Methods: We conducted a scoping review, which involved a comprehensive electronic literature search of 9 databases in July 2017. The search strategy focused on studies that used wearable cameras to capture one or more modifiable lifestyle risk factors associated with chronic disease or to capture typical self-management behaviors, or studies that involved a chronic disease population. We then categorized and described included studies according to their characteristics (eg, behaviors measured, study design or type, characteristics of the sample). Results: We identified 31 studies: 25 studies involved primary or secondary data analysis, and 6 were review, discussion, or descriptive articles. Wearable cameras were predominantly used to capture dietary intake, physical activity, activities of daily living, and sedentary behavior. Populations studied were predominantly healthy volunteers, school students, and sports people, with only 1 study examining an intervention using wearable cameras for people with an acquired brain injury. Most studies highlighted technical or ethical issues associated with using wearable cameras, many of which were overcome. Conclusions: This scoping review highlighted the potential of wearable cameras to capture health-related behaviors and risk factors of chronic disease, such as diet, exercise, and sedentary behaviors. Data collected from wearable cameras can be used as an adjunct to traditional data collection methods such as self-reported diaries in addition to providing valuable contextual information. While most studies to date have focused on healthy populations, wearable cameras offer promise to better understand self-management of chronic disease and its context.

@ARTICLE { maddison_etal_jmir19_usefulness, AUTHOR = { Ralph Maddison and Susie Cartledge and Michelle Rogerson and Nicole Sylvia Goedhart and Tarveen Ragbir Singh and Christopher Neil and Dinh Phung and Kylie Ball }, JOURNAL = { JMIR mHealth and uHealth }, TITLE = { Usefulness of Wearable Cameras as a Tool to Enhance Chronic Disease Self-Management: Scoping Review }, YEAR = { 2019 }, ISSN = { 2291-5222 }, MONTH = { Jan }, NUMBER = { 1 }, PAGES = { e10371 }, VOLUME = { 7 }, ABSTRACT = { Background: Self-management is a critical component of chronic disease management and can include a host of activities, such as adhering to prescribed medications, undertaking daily care activities, managing dietary intake and body weight, and proactively contacting medical practitioners. The rise of technologies (mobile phones, wearable cameras) for health care use offers potential support for people to better manage their disease in collaboration with their treating health professionals. Wearable cameras can be used to provide rich contextual data and insight into everyday activities and aid in recall. This information can then be used to prompt memory recall or guide the development of interventions to support self-management. Application of wearable cameras to better understand and augment self-management by people with chronic disease has yet to be investigated. Objective: The objective of our review was to ascertain the scope of the literature on the use of wearable cameras for self-management by people with chronic disease and to determine the potential of wearable cameras to assist people to better manage their disease. Methods: We conducted a scoping review, which involved a comprehensive electronic literature search of 9 databases in July 2017. The search strategy focused on studies that used wearable cameras to capture one or more modifiable lifestyle risk factors associated with chronic disease or to capture typical self-management behaviors, or studies that involved a chronic disease population. We then categorized and described included studies according to their characteristics (eg, behaviors measured, study design or type, characteristics of the sample). Results: We identified 31 studies: 25 studies involved primary or secondary data analysis, and 6 were review, discussion, or descriptive articles. Wearable cameras were predominantly used to capture dietary intake, physical activity, activities of daily living, and sedentary behavior. Populations studied were predominantly healthy volunteers, school students, and sports people, with only 1 study examining an intervention using wearable cameras for people with an acquired brain injury. Most studies highlighted technical or ethical issues associated with using wearable cameras, many of which were overcome. Conclusions: This scoping review highlighted the potential of wearable cameras to capture health-related behaviors and risk factors of chronic disease, such as diet, exercise, and sedentary behaviors. Data collected from wearable cameras can be used as an adjunct to traditional data collection methods such as self-reported diaries in addition to providing valuable contextual information. While most studies to date have focused on healthy populations, wearable cameras offer promise to better understand self-management of chronic disease and its context. }, DAY = { 03 }, DOI = { 10.2196/10371 }, FILE = { :ralph_etal_jmir19_usefulness - Usefulness of Wearable Cameras As a Tool to Enhance Chronic Disease Self Management_ Scoping Review.pdf:PDF }, KEYWORDS = { eHealth; review; cameras; life-logging; lifestyle behavior; chronic disease }, URL = { https://mhealth.jmir.org/2019/1/e10371/ }, }

On Deep Domain Adaptation: Some Theoretical Understandings
Trung Le, Khanh Nguyen, Nhat Ho, Hung Bui and Dinh Phung, jun 2019. [ | | pdf]
Compared with shallow domain adaptation, recent progress in deep domain adaptation has shown that it can achieve higher predictive performance and stronger capacity to tackle structural data (e.g., image and sequential data). The underlying idea of deep domain adaptation is to bridge the gap between source and target domains in a joint space so that a supervised classifier trained on labeled source data can be nicely transferred to the target domain. This idea is certainly intuitive and powerful, however, limited theoretical understandings have been developed to support its underpinning principle. In this paper, we have provided a rigorous framework to explain why it is possible to close the gap of the target and source domains in the joint space. More specifically, we first study the loss incurred when performing transfer learning from the source to the target domain. This provides a theory that explains and generalizes existing work in deep domain adaptation which was mainly empirical. This enables us to further explain why closing the gap in the joint space can directly minimize the loss incurred for transfer learning between the two domains. To our knowledge, this offers the first theoretical result that characterizes a direct bound on the joint space and the gain of transfer learning via deep domain adaptation.

@MISC { le_etal_arxiv19_ondeepdomain, AUTHOR = { Trung Le and Khanh Nguyen and Nhat Ho and Hung Bui and Dinh Phung }, TITLE = { On Deep Domain Adaptation: Some Theoretical Understandings }, MONTH = { jun }, YEAR = { 2019 }, ABSTRACT = { Compared with shallow domain adaptation, recent progress in deep domain adaptation has shown that it can achieve higher predictive performance and stronger capacity to tackle structural data (e.g., image and sequential data). The underlying idea of deep domain adaptation is to bridge the gap between source and target domains in a joint space so that a supervised classifier trained on labeled source data can be nicely transferred to the target domain. This idea is certainly intuitive and powerful, however, limited theoretical understandings have been developed to support its underpinning principle. In this paper, we have provided a rigorous framework to explain why it is possible to close the gap of the target and source domains in the joint space. More specifically, we first study the loss incurred when performing transfer learning from the source to the target domain. This provides a theory that explains and generalizes existing work in deep domain adaptation which was mainly empirical. This enables us to further explain why closing the gap in the joint space can directly minimize the loss incurred for transfer learning between the two domains. To our knowledge, this offers the first theoretical result that characterizes a direct bound on the joint space and the gain of transfer learning via deep domain adaptation. }, ARCHIVEPREFIX = { arXiv }, JOURNAL = { arXiv }, URL = { http://arxiv.org/abs/1811.06199 }, }

On Scalable Variant of Wasserstein Barycenter
Tam Le, Viet Huynh, Nhat Ho, Dinh Phung and Makoto Yamada, 2019. [ | ]
We study a variant of Wasserstein barycenter problem, which we refer to as \emph{tree-sliced Wasserstein barycenter}, by leveraging the structure of tree metrics for the ground metrics in the formulation of Wasserstein distance. Drawing on the tree structure, we propose efficient algorithms for solving the unconstrained and constrained versions of tree-sliced Wasserstein barycenter. The algorithms have fast computational time and efficient memory usage, especially for high dimensional settings while demonstrating favorable results when the tree metrics are appropriately constructed. Experimental results on large-scale synthetic and real datasets from Wasserstein barycenter for documents with word embedding, multilevel clustering, and scalable Bayes problems show the advantages of tree-sliced Wasserstein barycenter over (Sinkhorn) Wasserstein barycenter.

@MISC { le_etal_arxiv19_scalable, AUTHOR = { Tam Le and Viet Huynh and Nhat Ho and Dinh Phung and Makoto Yamada }, TITLE = { On Scalable Variant of Wasserstein Barycenter }, YEAR = { 2019 }, ABSTRACT = { We study a variant of Wasserstein barycenter problem, which we refer to as \emph{tree-sliced Wasserstein barycenter}, by leveraging the structure of tree metrics for the ground metrics in the formulation of Wasserstein distance. Drawing on the tree structure, we propose efficient algorithms for solving the unconstrained and constrained versions of tree-sliced Wasserstein barycenter. The algorithms have fast computational time and efficient memory usage, especially for high dimensional settings while demonstrating favorable results when the tree metrics are appropriately constructed. Experimental results on large-scale synthetic and real datasets from Wasserstein barycenter for documents with word embedding, multilevel clustering, and scalable Bayes problems show the advantages of tree-sliced Wasserstein barycenter over (Sinkhorn) Wasserstein barycenter. }, ARCHIVEPREFIX = { arXiv }, EPRINT = { 1910.04483 }, PRIMARYCLASS = { stat.ML }, }

Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions
He Zhao, Trung Le, Paul Montague, Olivier De Vel, Tamas Abraham and Dinh Phung, 2019. [ | ]

@MISC { zhao_etal_arxiv19_perturbations, AUTHOR = { He Zhao and Trung Le and Paul Montague and Olivier De Vel and Tamas Abraham and Dinh Phung }, TITLE = { Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions }, YEAR = { 2019 }, ARCHIVEPREFIX = { arXiv }, EPRINT = { 1910.01329 }, PRIMARYCLASS = { cs.LG }, }

Unsupervised Universal Self-Attention Network for Graph Classification
Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung, 2019. [ | ]

@MISC { nguyen_etal_arxiv19_unsupervised, AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung }, TITLE = { Unsupervised Universal Self-Attention Network for Graph Classification }, YEAR = { 2019 }, ARCHIVEPREFIX = { arXiv }, EPRINT = { 1909.11855 }, PRIMARYCLASS = { cs.LG }, }

On Efficient Multilevel Clustering via Wasserstein Distances
Viet Huynh, Nhat Ho, Nhan Dam, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui and and Dinh Phung, 2019. [ | ]

@MISC { huynh_etal_arxiv19_efficient, AUTHOR = { Viet Huynh and Nhat Ho and Nhan Dam and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and and Dinh Phung }, TITLE = { On Efficient Multilevel Clustering via Wasserstein Distances }, YEAR = { 2019 }, ARCHIVEPREFIX = { arXiv }, EPRINT = { 1909.08787 }, PRIMARYCLASS = { stat.ML }, }

2018

Model-Based Learning for Point Pattern Data
Ba-Ngu Vo, Nhan Dam, Dinh Phung, Quang N. Tran and Ba-Tuong Vo. Pattern Recognition (PR), 84:136-151, 2018. [ | | pdf]
This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.

@ARTICLE { vo_etal_pr18_modelbased, AUTHOR = { Ba-Ngu Vo and Nhan Dam and Dinh Phung and Quang N. Tran and Ba-Tuong Vo }, JOURNAL = { Pattern Recognition (PR) }, TITLE = { Model-Based Learning for Point Pattern Data }, YEAR = { 2018 }, ISSN = { 0031-3203 }, PAGES = { 136--151 }, VOLUME = { 84 }, ABSTRACT = { This article proposes a framework for model-based point pattern learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed. }, DOI = { https://doi.org/10.1016/j.patcog.2018.07.008 }, FILE = { :vo_etal_pr18_modelbased - Model Based Learning for Point Pattern Data.pdf:PDF }, KEYWORDS = { Point pattern, Point process, Random finite set, Multiple instance learning, Classification, Novelty detection, Clustering }, PUBLISHER = { Elsevier }, URL = { http://www.sciencedirect.com/science/article/pii/S0031320318302395 }, }

Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data
Khanh Nguyen, Trung Le, Tu Nguyen, Geoff Webb and Dinh Phung. In Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), London, UK, aug 2018. [ | ]
Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to eﬀectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals.

@INPROCEEDINGS { nguyen_etal_kdd18_robustbayesian, AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Geoff Webb and Dinh Phung }, TITLE = { Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data }, BOOKTITLE = { Proc. of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD) }, YEAR = { 2018 }, ADDRESS = { London, UK }, MONTH = { aug }, PUBLISHER = { ACM }, ABSTRACT = { Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to eﬀectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) – a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efcient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efcient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and signifcantly outperforms other baselines, while obtaining signifcantly speedup in terms of the total training time compared with its rivals. }, FILE = { :nguyen_etal_kdd18_robustbayesian - Robust Bayesian Kernel Machine Via Stein Variational Gradient Descent for Big Data.pdf:PDF }, }

MGAN: Training Generative Adversarial Nets with Multiple Generators
Quan Hoang, Tu Dinh Nguyen, Trung Le and Dinh Phung. In International Conference on Learning Representations (ICLR), 2018. [ | | pdf]
We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators.

@INPROCEEDINGS { hoang_etal_iclr18_mgan, AUTHOR = { Quan Hoang and Tu Dinh Nguyen and Trung Le and Dinh Phung }, TITLE = { {MGAN}: Training Generative Adversarial Nets with Multiple Generators }, BOOKTITLE = { International Conference on Learning Representations (ICLR) }, YEAR = { 2018 }, ABSTRACT = { We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators. }, FILE = { :hoang_etal_iclr18_mgan - MGAN_ Training Generative Adversarial Nets with Multiple Generators.pdf:PDF }, URL = { https://openreview.net/forum?id=rkmu5b0a- }, }

Geometric enclosing networks
Trung Le, Hung Vu, Tu Dinh Nguyen and Dinh Phung. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18}, pages 2355-2361, July 2018. [ | ]
Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data.

@INPROCEEDINGS { le_etal_ijcai18_geometric, AUTHOR = { Trung Le and Hung Vu and Tu Dinh Nguyen and Dinh Phung }, TITLE = { Geometric enclosing networks }, BOOKTITLE = { Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, {IJCAI-18} }, PUBLISHER = { International Joint Conferences on Artificial Intelligence Organization }, PAGES = { 2355--2361 }, YEAR = { 2018 }, MONTH = { July }, ABSTRACT = { Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometry-based optimization approach to address this problem. Orthogonal to current stateof-the-art density-based approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G (z) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easy-to-control optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and real-world datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multi-modal data and quality of generated data. }, FILE = { :le_etal_ijcai18_geometric - Geometric Enclosing Networks.pdf:PDF }, }

A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen and Dinh Phung. In Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2018. [ | | pdf]
We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237.

@INPROCEEDINGS { nguyen_etal_naacl18_anovelembedding, AUTHOR = { Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung }, TITLE = { A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network }, BOOKTITLE = { Proc. of. the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL) }, YEAR = { 2018 }, ABSTRACT = { We introduce a novel embedding method for knowledge base completion task. Our approach advances state-of-the-art (SOTA) by employing a convolutional neural network (CNN) for the task which can capture global relationships and transitional characteristics. We represent each triple (head entity, relation, tail entity) as a 3-column matrix which is the input for the convolution layer. Different filters having a same shape of 1x3 are operated over the input matrix to produce different feature maps which are then concatenated into a single feature vector. This vector is used to return a score for the triple via a dot product. The returned score is used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction results than previous SOTA models on two current benchmark datasets WN18RR and FB15k-237. }, FILE = { :nguyen_etal_naacl18_anovelembedding - A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network.pdf:PDF }, URL = { https://arxiv.org/abs/1712.02121 }, }

Text Generation with Deep Variational GAN
Mahmoud Hossam, Trung Le, Michael Papasimeon, Viet Huynh and Dinh Phung. In 32nd Neural Information Processing System (NIPS) Workshop on Bayesian Deep Learning, 2018. [ | ]
Generating realistic sequences is a central task in many machine learning appli-cations. There has been considerable recent progress on building deep generativemodels for sequence generation tasks. However, the issue of mode-collapsingremains a main issue for the current models. In this paper we propose a GAN-based generic framework to address the problem of mode-collapse in a principledapproach. We change the standard GAN objective to maximize a variationallower-bound of the log-likelihood while minimizing the Jensen-Shanon diver-gence between data and model distributions. We experiment our model with textgeneration task and show that it can generate realistic text with high diversity.

@INPROCEEDINGS { hossam_etal_bdl18_textgeneration, AUTHOR = { Mahmoud Hossam and Trung Le and Michael Papasimeon and Viet Huynh and Dinh Phung }, TITLE = { Text Generation with Deep Variational {GAN} }, BOOKTITLE = { 32nd Neural Information Processing System (NIPS) Workshop on Bayesian Deep Learning }, YEAR = { 2018 }, ABSTRACT = { Generating realistic sequences is a central task in many machine learning appli-cations. There has been considerable recent progress on building deep generativemodels for sequence generation tasks. However, the issue of mode-collapsingremains a main issue for the current models. In this paper we propose a GAN-based generic framework to address the problem of mode-collapse in a principledapproach. We change the standard GAN objective to maximize a variationallower-bound of the log-likelihood while minimizing the Jensen-Shanon diver-gence between data and model distributions. We experiment our model with textgeneration task and show that it can generate realistic text with high diversity. }, FILE = { :hossam_etal_bdl18_textgeneration - Text Generation with Deep Variational GAN.pdf:PDF }, }

Batch-normalized Deep Boltzmann Machines
Hung Vu, Tu Dinh Nguyen, Trung Le, Wei Luo and Dinh Phung. In In Proceedings of Asian Conference on Machine Learning (ACML), Beijing, China, 2018. [ | ]

@INPROCEEDINGS { vu_etal_acml18_batchnormalized, AUTHOR = { Hung Vu and Tu Dinh Nguyen and Trung Le and Wei Luo and Dinh Phung }, TITLE = { Batch-normalized Deep {Boltzmann} Machines }, BOOKTITLE = { In Proceedings of Asian Conference on Machine Learning (ACML) }, YEAR = { 2018 }, ADDRESS = { Beijing, China }, OWNER = { hungv }, TIMESTAMP = { 2018.03.22 }, }

Clustering Induced Kernel Learning
Nguyen, Khanh, Dam, Nhan, Le, Trung, Nguyen, {Tu Dinh} and Phung, Dinh. In Proc. of the 10th Asian Conference on Machine Learning (ACML), pages 129-144, 14--16 Nov 2018. [ | | pdf]
Learning rich and expressive kernel functions is a challenging task in kernel-based supervised learning. Multiple kernel learning (MKL) approach addresses this problem by combining a mixed variety of kernels and letting the optimization solver choose the most appropriate combination. However, most of existing methods are parametric in the sense that they require a predefined list of kernels. Hence, there appears a substantial trade-off between computation and the modeling risk of not being able to explore more expressive and suitable kernel functions. Moreover, current existing approaches to combine kernels cannot exploit clustering structure carried in data, especially when data are heterogeneous. In this work, we present a new framework that leverages Bayesian nonparametric models (i.e, automatically grow kernel functions) with multiple kernel learning to develop a new framework that enjoys the nonparametric flavor in the context of multiple kernel learning. In particular, we propose \emph{Clustering Induced Kernel Learning} (CIK) method that can automatically discover clustering structure from the data and train a single kernel machine to fit data in each discovered cluster simultaneously. The outcome of our proposed method includes both clustering analysis and multiple kernel classifier for a given dataset. We conduct extensive experiments on several benchmark datasets. The experimental results show that our method can improve classification and clustering performance when datasets have complex clustering structure with different preferred kernels.

@INPROCEEDINGS { nguyen_etal_acml18_clustering, AUTHOR = { Nguyen, Khanh and Dam, Nhan and Le, Trung and Nguyen, {Tu Dinh} and Phung, Dinh }, TITLE = { Clustering Induced Kernel Learning }, BOOKTITLE = { Proc. of the 10th Asian Conference on Machine Learning (ACML) }, YEAR = { 2018 }, EDITOR = { Zhu, Jun and Takeuchi, Ichiro }, VOLUME = { 95 }, SERIES = { Proceedings of Machine Learning Research }, PAGES = { 129--144 }, MONTH = { 14--16 Nov }, PUBLISHER = { PMLR }, ABSTRACT = { Learning rich and expressive kernel functions is a challenging task in kernel-based supervised learning. Multiple kernel learning (MKL) approach addresses this problem by combining a mixed variety of kernels and letting the optimization solver choose the most appropriate combination. However, most of existing methods are parametric in the sense that they require a predefined list of kernels. Hence, there appears a substantial trade-off between computation and the modeling risk of not being able to explore more expressive and suitable kernel functions. Moreover, current existing approaches to combine kernels cannot exploit clustering structure carried in data, especially when data are heterogeneous. In this work, we present a new framework that leverages Bayesian nonparametric models (i.e, automatically grow kernel functions) with multiple kernel learning to develop a new framework that enjoys the nonparametric flavor in the context of multiple kernel learning. In particular, we propose \emph{Clustering Induced Kernel Learning} (CIK) method that can automatically discover clustering structure from the data and train a single kernel machine to fit data in each discovered cluster simultaneously. The outcome of our proposed method includes both clustering analysis and multiple kernel classifier for a given dataset. We conduct extensive experiments on several benchmark datasets. The experimental results show that our method can improve classification and clustering performance when datasets have complex clustering structure with different preferred kernels. }, FILE = { :nguyen_etal_acml18_clustering - Clustering Induced Kernel Learning.pdf:PDF;nguyen18a.pdf:http\://proceedings.mlr.press/v95/nguyen18a/nguyen18a.pdf:PDF }, URL = { http://proceedings.mlr.press/v95/nguyen18a.html }, }

LTARM: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment
Dang Nguyen, Wei Luo, Dinh Phung and Svetha Venkatesh. Knowledge-Based Systems, 2018. [ | ]
Cancer is a worldwide problem and one of the leading causes of death. Increasing prevalence of cancer, particularly in developing countries, demands better understandings of the effectiveness and adverse consequences of different cancer treatment regimes in real patient populations. Current understandings of cancer treatment toxicities are often derived from either “clean” patient cohorts or coarse population statistics. Thus, it is difficult to get up-to-date and local assessments of treatment toxicities for specific cancer centers. To address these problems, we propose a novel and efficient method for discovering toxicity progression patterns in the form of temporal association rules (TARs). A temporal association rule is defined as a rule where the diagnosis codes in the right hand side (e.g., a combination of toxicities/complications) are temporally occurred after the diagnosis codes in the left hand side (e.g., a particular type of cancer treatment). Our method develops a lattice structure to efficiently discover TARs. More specifically, the lattice structure is first constructed to store all frequent diagnosis codes in the dataset. It is then traversed using the paternity relations among nodes to generate TARs. Our extensive experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the temporal comorbidity analysis. In addition, our method significantly outperforms existing methods for mining TARs in terms of runtime.

@ARTICLE { nguyen_kbs18_ltarm, AUTHOR = { Dang Nguyen and Wei Luo and Dinh Phung and Svetha Venkatesh }, TITLE = { {LTARM}: A novel temporal association rule mining method to understand toxicities in a routine cancer treatment }, JOURNAL = { Knowledge-Based Systems }, YEAR = { 2018 }, ABSTRACT = { Cancer is a worldwide problem and one of the leading causes of death. Increasing prevalence of cancer, particularly in developing countries, demands better understandings of the effectiveness and adverse consequences of different cancer treatment regimes in real patient populations. Current understandings of cancer treatment toxicities are often derived from either “clean” patient cohorts or coarse population statistics. Thus, it is difficult to get up-to-date and local assessments of treatment toxicities for specific cancer centers. To address these problems, we propose a novel and efficient method for discovering toxicity progression patterns in the form of temporal association rules (TARs). A temporal association rule is defined as a rule where the diagnosis codes in the right hand side (e.g., a combination of toxicities/complications) are temporally occurred after the diagnosis codes in the left hand side (e.g., a particular type of cancer treatment). Our method develops a lattice structure to efficiently discover TARs. More specifically, the lattice structure is first constructed to store all frequent diagnosis codes in the dataset. It is then traversed using the paternity relations among nodes to generate TARs. Our extensive experiments show the effectiveness of the proposed method in discovering major toxicity patterns in comparison with the temporal comorbidity analysis. In addition, our method significantly outperforms existing methods for mining TARs in terms of runtime. }, DOI = { https://doi.org/10.1016/j.knosys.2018.07.031 }, FILE = { :nguyen_kbs18_ltarm - LTARM_ a Novel Temporal Association Rule Mining Method to Understand Toxicities in a Routine Cancer Treatment.pdf:PDF }, }

Jointly predicting affective and mental health scores using deep neural networks of visual cues on the Web
Hung Nguyen, Van Nguyen, Thin Nguyen, Mark Larsen, Bridianne O'Dea, Duc Thanh Nguyen, Trung Le, Dinh Phung, Svetha Venkatesh and Helen Christensen. In Proc. of the Int. Conf. on Web Information Systems Engineering (WISE)Springer, , 2018. [ | ]
Despite the range of studies examining the relationship between mental health and social media data, not all prior studies have validated the social media markers against “ground truth”, or validated psychiatric information, in general community samples. Instead, researchers have approximated psychiatric diagnosis using user statements such as “I have been diagnosed as X”. Without “ground truth”, the value of predictive algorithms is highly questionable and potentially harmful. In addition, for social media data, whilst linguistic features have been widely identified as strong markers of mental health disorders, little is known about non-textual features on their links with the disorders. The current work is a longitudinal study during which participants’ mental health data, consisting of depression and anxiety scores, were collected fortnightly with a validated, diagnostic, clinical measure. Also, datasets with labels relevant to mental health scores, such as emotional scores, are also employed to improve the performance in prediction of mental health scores. This work introduces a deep neural network-based method integrating sub-networks on predicting affective scores and mental health outcomes from images. Experimental results have shown that in the both predictions of emotion and mental health scores, (1) deep features majorly outperform handcrafted ones and (2) the proposed network achieves better performance compared with separate networks.

@INCOLLECTION { nguyen_etal_wise18_jointly, AUTHOR = { Hung Nguyen and Van Nguyen and Thin Nguyen and Mark Larsen and Bridianne O'Dea and Duc Thanh Nguyen and Trung Le and Dinh Phung and Svetha Venkatesh and Helen Christensen }, TITLE = { Jointly predicting affective and mental health scores using deep neural networks of visual cues on the Web }, BOOKTITLE = { Proc. of the Int. Conf. on Web Information Systems Engineering (WISE) }, PUBLISHER = { Springer }, YEAR = { 2018 }, SERIES = { Lecture Notes in Computer Science }, ABSTRACT = { Despite the range of studies examining the relationship between mental health and social media data, not all prior studies have validated the social media markers against “ground truth”, or validated psychiatric information, in general community samples. Instead, researchers have approximated psychiatric diagnosis using user statements such as “I have been diagnosed as X”. Without “ground truth”, the value of predictive algorithms is highly questionable and potentially harmful. In addition, for social media data, whilst linguistic features have been widely identified as strong markers of mental health disorders, little is known about non-textual features on their links with the disorders. The current work is a longitudinal study during which participants’ mental health data, consisting of depression and anxiety scores, were collected fortnightly with a validated, diagnostic, clinical measure. Also, datasets with labels relevant to mental health scores, such as emotional scores, are also employed to improve the performance in prediction of mental health scores. This work introduces a deep neural network-based method integrating sub-networks on predicting affective scores and mental health outcomes from images. Experimental results have shown that in the both predictions of emotion and mental health scores, (1) deep features majorly outperform handcrafted ones and (2) the proposed network achieves better performance compared with separate networks. }, FILE = { :nguyen_etal_wise18_jointly - Jointly Predicting Affective and Mental Health Scores Using Deep Neural Networks of Visual Cues on the Web.pdf:PDF }, LANGUAGE = { English }, OWNER = { thinng }, TIMESTAMP = { 2017.08.28 }, }

Learning Graph Representation via Frequent Subgraphs
Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In Proc. of SIAM Int. Conf. on Data Mining (SDM), 2018. (Student travel award). [ | ]

@INPROCEEDINGS { nguyen_etal_sdm18_learning, AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung }, TITLE = { Learning Graph Representation via Frequent Subgraphs }, BOOKTITLE = { Proc. of SIAM Int. Conf. on Data Mining (SDM) }, YEAR = { 2018 }, PUBLISHER = { SIAM }, NOTE = { Student travel award }, FILE = { :nguyen_etal_sdm18_learning - Learning Graph Representation Via Frequent Subgraphs.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2018.01.12 }, }

Sqn2Vec: Learning Sequence Representation via Sequential Patterns with a Gap Constraint
Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh and Dinh Phung. In ECML-PKDD, 2018. (Runner-up Best Student Machine Leaning Paper Award). [ | ]
When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization.

@INPROCEEDINGS { nguyen_etal_ecml18_sqn2vec, AUTHOR = { Dang Nguyen and Wei Luo and Tu Dinh Nguyen and Svetha Venkatesh and Dinh Phung }, TITLE = { {Sqn2Vec}: Learning Sequence Representation via Sequential Patterns with a Gap Constraint }, BOOKTITLE = { ECML-PKDD }, YEAR = { 2018 }, NOTE = { Runner-up Best Student Machine Leaning Paper Award }, ABSTRACT = { When learning sequence representations, traditional pattern-based methods often suffer from the data sparsity and high-dimensionality problems while recent neural embedding methods often fail on sequential datasets with a small vocabulary. To address these disadvantages, we propose an unsupervised method (named Sqn2Vec) which first leverages sequential patterns (SPs) to increase the vocabulary size and then learns low-dimensional continuous vectors for sequences via a neural embedding model. Moreover, our method enforces a gap constraint among symbols in sequences to obtain meaningful and discriminative SPs. Consequently, Sqn2Vec produces significantly better sequence representations than a comprehensive list of state-of-the-art baselines, particularly on sequential datasets with a relatively small vocabulary. We demonstrate the superior performance of Sqn2Vec in several machine learning tasks including sequence classification, clustering, and visualization. }, FILE = { :nguyen_etal_ecml18_sqn2vec - Sqn2Vec_ Learning Sequence Representation Via Sequential Patterns with a Gap Constraint.pdf:PDF }, }

A Convolutional Neural Network-based Model for Knowledge Base Completion and Its Application to Search Personalization
Dai Quoc Nguyen, Dat Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung. Semantic Web journal (SWJ), 2018. [ | | pdf]
In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB obtains better link prediction and triple classification results than previous state-of-the-art models on benchmark datasets WN18RR, FB15k-237, WN11 and FB13. We further apply our ConvKB to search personalization problem which aims to tailor the search results to each specific user based on the user's personal interests and preferences. In particular, we model the potential relationship between the submitted query, the user and the search result (i.e., document) as a triple \textit(query, user, document) on which the ConvKB is able to work. Experimental results on query logs from a commercial web search engine show that ConvKB achieves better performances than the standard ranker as well as up-to-date search personalization baselines.

@ARTICLE { nguyen_etal_swj18_convolutional, AUTHOR = { Dai Quoc Nguyen and Dat Quoc Nguyen and Tu Dinh Nguyen and Dinh Phung }, TITLE = { A Convolutional Neural Network-based Model for Knowledge Base Completion and Its Application to Search Personalization }, JOURNAL = { Semantic Web journal (SWJ) }, YEAR = { 2018 }, ABSTRACT = { In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB obtains better link prediction and triple classification results than previous state-of-the-art models on benchmark datasets WN18RR, FB15k-237, WN11 and FB13. We further apply our ConvKB to search personalization problem which aims to tailor the search results to each specific user based on the user's personal interests and preferences. In particular, we model the potential relationship between the submitted query, the user and the search result (i.e., document) as a triple \textit(query, user, document) on which the ConvKB is able to work. Experimental results on query logs from a commercial web search engine show that ConvKB achieves better performances than the standard ranker as well as up-to-date search personalization baselines. }, FILE = { :nguyen_etal_swj18_convolutional - A Convolutional Neural Network Based Model for Knowledge Base Completion and Its Application to Search Personalization.pdf:PDF }, URL = { http://www.semantic-web-journal.net/system/files/swj1867.pdf }, }

GoGP: Scalable Geometric-based Gaussian Process for Online Regression
Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. Knowledge and Information Systems (KAIS), may 2018. [ | ]
One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.

@ARTICLE { le_etal_kais18_gogp, AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung }, TITLE = { {GoGP}: Scalable Geometric-based Gaussian Process for Online Regression }, JOURNAL = { Knowledge and Information Systems (KAIS) }, YEAR = { 2018 }, MONTH = { may }, ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. }, FILE = { :le_etal_kais18_gogp - GoGP_ Scalable Geometric Based Gaussian Process for Online Regression.pdf:PDF }, }

Effective Identification of Similar Patients through Sequential Matching over ICD Code Embedding
Dang Nguyen, Wei Luo, Svetha Venkatesh and Dinh Phung. Journal of Medical Systems (JMS), 42(5):94, April 2018. [ | | pdf]
Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.

@ARTICLE { nguyen_etal_jms18_effective, AUTHOR = { Dang Nguyen and Wei Luo and Svetha Venkatesh and Dinh Phung }, TITLE = { Effective Identification of Similar Patients through Sequential Matching over ICD Code Embedding }, JOURNAL = { Journal of Medical Systems (JMS) }, YEAR = { 2018 }, VOLUME = { 42 }, NUMBER = { 5 }, PAGES = { 94 }, MONTH = { April }, ABSTRACT = { Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data. }, FILE = { :nguyen_etal_jms18_effective - Effective Identification of Similar Patients through Sequential Matching Over ICD Code Embedding.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2018.03.29 }, URL = { https://link.springer.com/article/10.1007/s10916-018-0951-4 }, }

Bayesian Multi-Hyperplane Machine for Pattern Recognition
Khanh Nguyen, Trung Le, Tu Nguyen and Dinh Phung. In Proc. of the 24th International Conference on Pattern Recognition (ICPR), Beijing, China, aug 2018. [ | ]
Current existing multi-hyperplane machine approach deals with high-dimensional and complex datasets by approximating the input data region using a parametric mixture of hyperplanes. Consequently, this approach requires an excessively time-consuming parameter search to find the set of optimal hyper-parameters. Another serious drawback of this approach is that it is often suboptimal since the optimal choice for the hyper-parameter is likely to lie outside the searching space due to the space discretization step required in grid search. To address these challenges, we propose in this paper BAyesian Multi-hyperplane Machine (BAMM). Our approach departs from a Bayesian perspective, and aims to construct an alternative probabilistic view in such a way that its maximuma-posteriori (MAP) estimation reduces exactly to the original optimization problem of a multi-hyperplane machine. This view allows us to endow prior distributions over hyper-parameters and augment auxiliary variables to efficiently infer model parameters and hyper-parameters via Markov chain Monte Carlo (MCMC) method. We then employ a Stochastic Gradient Descent (SGD) framework to scale our model up with ever-growing large datasets. Extensive experiments demonstrate the capability of our proposed method in learning the optimal model without using any parameter tuning, and in achieving comparable accuracies compared with the state-of-art baselines; in the meantime our model can seamlessly handle with large-scale datasets.

@INPROCEEDINGS { nguyen_etal_icpr18_bayesian, AUTHOR = { Khanh Nguyen and Trung Le and Tu Nguyen and Dinh Phung }, TITLE = { Bayesian Multi-Hyperplane Machine for Pattern Recognition }, BOOKTITLE = { Proc. of the 24th International Conference on Pattern Recognition (ICPR) }, YEAR = { 2018 }, ADDRESS = { Beijing, China }, MONTH = { aug }, ABSTRACT = { Current existing multi-hyperplane machine approach deals with high-dimensional and complex datasets by approximating the input data region using a parametric mixture of hyperplanes. Consequently, this approach requires an excessively time-consuming parameter search to find the set of optimal hyper-parameters. Another serious drawback of this approach is that it is often suboptimal since the optimal choice for the hyper-parameter is likely to lie outside the searching space due to the space discretization step required in grid search. To address these challenges, we propose in this paper BAyesian Multi-hyperplane Machine (BAMM). Our approach departs from a Bayesian perspective, and aims to construct an alternative probabilistic view in such a way that its maximuma-posteriori (MAP) estimation reduces exactly to the original optimization problem of a multi-hyperplane machine. This view allows us to endow prior distributions over hyper-parameters and augment auxiliary variables to efficiently infer model parameters and hyper-parameters via Markov chain Monte Carlo (MCMC) method. We then employ a Stochastic Gradient Descent (SGD) framework to scale our model up with ever-growing large datasets. Extensive experiments demonstrate the capability of our proposed method in learning the optimal model without using any parameter tuning, and in achieving comparable accuracies compared with the state-of-art baselines; in the meantime our model can seamlessly handle with large-scale datasets. }, FILE = { :nguyen_etal_icpr18_bayesian - Bayesian Multi Hyperplane Machine for Pattern Recognition.pdf:PDF }, }

2017

Dual Discriminator Generative Adversarial Nets
Tu Dinh Nguyen, Trung Le, Hung Vu and Dinh Phung. In Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS), pages 2667-2677, USA, 2017. [ | | pdf]
We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database.

@INPROCEEDINGS { tu_etal_nips17_d2gan, AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Vu and Dinh Phung }, TITLE = { Dual Discriminator Generative Adversarial Nets }, BOOKTITLE = { Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS) }, YEAR = { 2017 }, SERIES = { NIPS'17 }, PAGES = { 2667--2677 }, ADDRESS = { USA }, PUBLISHER = { Curran Associates Inc. }, ABSTRACT = { We propose in this paper a novel approach to tackle the problem of mode collapse encountered in generative adversarial network (GAN). Our idea is intuitive but proven to be very effective, especially in addressing some key limitations of GAN. In essence, it combines the Kullback-Leibler (KL) and reverse KL divergences into a unified objective function, thus it exploits the complementary statistical properties from these divergences to effectively diversify the estimated density in capturing multi-modes. We term our method dual discriminator generative adversarial nets (D2GAN) which, unlike GAN, has two discriminators; and together with a generator, it also has the analogy of a minimax game, wherein a discriminator rewards high scores for samples from data distribution whilst another discriminator, conversely, favoring data from the generator, and the generator produces data to fool both two discriminators. We develop theoretical analysis to show that, given the maximal discriminators, optimizing the generator of D2GAN reduces to minimizing both KL and reverse KL divergences between data distribution and the distribution induced from the data generated by the generator, hence effectively avoiding the mode collapsing problem. We conduct extensive experiments on synthetic and real-world large-scale datasets (MNIST, CIFAR-10, STL-10, ImageNet), where we have made our best effort to compare our D2GAN with the latest state-of-the-art GAN's variants in comprehensive qualitative and quantitative evaluations. The experimental results demonstrate the competitive and superior performance of our approach in generating good quality and diverse samples over baselines, and the capability of our method to scale up to ImageNet database. }, ACMID = { 3295027 }, FILE = { :tu_etal_nips17_d2gan - Dual Discriminator Generative Adversarial Nets.pdf:PDF }, ISBN = { 978-1-5108-6096-4 }, LOCATION = { Long Beach, California, USA }, NUMPAGES = { 11 }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.09.06 }, URL = { http://dl.acm.org/citation.cfm?id=3294996.3295027 }, }

GoGP: Fast Online Regression with Gaussian Processes
Trung Le, Khanh Nguyen, Vu Nguyen, Tu Dinh Nguyen and Dinh Phung. In International Conference on Data Mining (ICDM), 2017. [ | ]
One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.

@INPROCEEDINGS { le_etal_icdm17_gogp, AUTHOR = { Trung Le and Khanh Nguyen and Vu Nguyen and Tu Dinh Nguyen and Dinh Phung }, TITLE = { {GoGP}: Fast Online Regression with Gaussian Processes }, BOOKTITLE = { International Conference on Data Mining (ICDM) }, YEAR = { 2017 }, ABSTRACT = { One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared withits rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors. }, FILE = { :le_etal_icdm17_gogp - GoGP_ Fast Online Regression with Gaussian Processes.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.09.01 }, }

Supervised Restricted Boltzmann Machines
Tu Dinh Nguyen, Dinh Phung, Viet Huynh and Trung Le. In In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI), 2017. [ | | pdf]
We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation.

@INPROCEEDINGS { nguyen_etal_uai17supervised, AUTHOR = { Tu Dinh Nguyen and Dinh Phung and Viet Huynh and Trung Le }, TITLE = { Supervised Restricted Boltzmann Machines }, BOOKTITLE = { In Proc. of the International Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2017 }, ABSTRACT = { We propose in this paper the supervised re-stricted Boltzmann machine (sRBM), a unified framework which combines the versatility of RBM to simultaneously learn the data representation and to perform supervised learning (i.e., a nonlinear classifier or a nonlinear regressor). Unlike the current state-of-the-art classification formulation proposed for RBM in (Larochelle et al., 2012), our model is a hybrid probabilistic graphical model consisting of a distinguished genera-tive component for data representation and a dis-criminative component for prediction. While the work of (Larochelle et al., 2012) typically incurs no extra difficulty in inference compared with a standard RBM, our discriminative component, modeled as a directed graphical model, renders MCMC-based inference (e.g., Gibbs sampler) very slow and unpractical for use. To this end, we further develop scalable variational inference for the proposed sRBM for both classification and regression cases. Extensive experiments on realworld datasets show that our sRBM achieves better predictive performance than baseline methods. At the same time, our proposed framework yields learned representations which are more discriminative, hence interpretable, than those of its counterparts. Besides, our method is probabilistic and capable of generating meaningful data conditioning on specific classes – a topic which is of current great interest in deep learning aiming at data generation. }, FILE = { :nguyen_etal_uai17supervised - Supervised Restricted Boltzmann Machines.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.08.29 }, URL = { http://auai.org/uai2017/proceedings/papers/106.pdf }, }

Multilevel clustering via Wasserstein means
Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, Viet Huynh and Dinh Phung. In Proc. of the 34th Internaltional Conference on Machine Learning (ICML), pages 1501-1509, 2017. [ | | pdf]
We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach.

@INPROCEEDINGS { ho_etal_icml17multilevel, AUTHOR = { Nhat Ho and XuanLong Nguyen and Mikhail Yurochkin and Hung Bui and Viet Huynh and Dinh Phung }, TITLE = { Multilevel clustering via {W}asserstein means }, BOOKTITLE = { Proc. of the 34th Internaltional Conference on Machine Learning (ICML) }, YEAR = { 2017 }, VOLUME = { 70 }, SERIES = { ICML'17 }, PAGES = { 1501--1509 }, PUBLISHER = { JMLR.org }, ABSTRACT = { We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a large hierarchically structural corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with the Wasserstein distance metric. We propose a number of variants of this problem, which admit fast optimization algorithms, by exploiting the connection to the problem of finding Wasserstein barycenters. We also establish consistency properties enjoyed by our estimates of both local and global clusters. Finally, we present experiment results with both synthetic and real data to demonstrate the flexibility and scalability of the proposed approach. }, ACMID = { 3305536 }, FILE = { :ho_etal_icml17multilevel - Multilevel Clustering Via Wasserstein Means.pdf:PDF }, LOCATION = { Sydney, NSW, Australia }, NUMPAGES = { 9 }, URL = { http://dl.acm.org/citation.cfm?id=3305381.3305536 }, }

Approximation Vector Machines for Large-scale Online Learning
Trung Le, Tu Dinh Nguyen, Vu Nguyen and Dinh Q. Phung. Journal of Machine Learning Research (JMLR), 2017. [ | | pdf]
One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.

@ARTICLE { le_etal_jmlr17approximation, AUTHOR = { Trung Le and Tu Dinh Nguyen and Vu Nguyen and Dinh Q. Phung }, TITLE = { Approximation Vector Machines for Large-scale Online Learning }, JOURNAL = { Journal of Machine Learning Research (JMLR) }, YEAR = { 2017 }, ABSTRACT = { One of the most challenging problems in kernel online learning is to bound the model size and to promote the model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity, a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage the sparsity and safeguard its risk in compromising the performance. When an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis to characterize the gap between the approximation and optimal solutions. This gap crucially depends on the frequency of approximation and the predefined threshold. We perform the convergence analysis for a wide spectrum of loss functions including Hinge, smooth Hinge, and Logistic for classification task, and l1, l2, and ϵ-insensitive for regression task. We conducted extensive experiments for classification task in batch and online modes, and regression task in online mode over several benchmark datasets. The results show that our proposed AVM achieved a comparable predictive performance with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size. }, FILE = { :le_etal_jmlr17approximation - Approximation Vector Machines for Large Scale Online Learning.pdf:PDF }, KEYWORDS = { kernel, online learning, large-scale machine learning, sparsity, big data, core set, stochastic gradient descent, convergence analysis }, URL = { https://arxiv.org/abs/1604.06518 }, }

Discriminative Bayesian Nonparametric Clustering
Vu Nguyen, Dinh Phung, Trung Le, Svetha Venkatesh and Hung Bui. In Proc. of International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models.

@INPROCEEDINGS { nguyen_etal_ijcai17discriminative, AUTHOR = { Vu Nguyen and Dinh Phung and Trung Le and Svetha Venkatesh and Hung Bui }, TITLE = { Discriminative Bayesian Nonparametric Clustering }, BOOKTITLE = { Proc. of International Joint Conference on Artificial Intelligence (IJCAI) }, YEAR = { 2017 }, ABSTRACT = { We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state infinite HMMs for sequential data. We develop efficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models. }, FILE = { :nguyen_etal_ijcai17discriminative - Discriminative Bayesian Nonparametric Clustering.pdf:PDF }, URL = { https://www.ijcai.org/proceedings/2017/355 }, }

Large-scale Online Kernel Learning with Random Feature Reparameterization
Tu Dinh Nguyen, Trung Le, Hung Bui and Dinh Phung. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), 2017. [ | | pdf]
A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency.

@INPROCEEDINGS { tu_etal_ijcai17_rrf, AUTHOR = { Tu Dinh Nguyen and Trung Le and Hung Bui and Dinh Phung }, BOOKTITLE = { Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) }, TITLE = { Large-scale Online Kernel Learning with Random Feature Reparameterization }, YEAR = { 2017 }, SERIES = { IJCAI'17 }, ABSTRACT = { A typical online kernel learning method faces two fundamental issues: the complexity in dealing with a huge number of observed data points (a.k.a the curse of kernelization) and the difficulty in learning kernel parameters, which often assumed to be fixed. Random Fourier feature is a recent and effective approach to address the former by approximating the shift-invariant kernel function via Bocher’s theorem, and allows the model to be maintained directly in the random feature space with a fixed dimension, hence the model size remains constant w.r.t. data size. We further introduce in this paper the reparameterized random feature (RRF), a random feature framework for large-scale online kernel learning to address both aforementioned challenges. Our initial intuition comes from the so-called ‘reparameterization trick’ [Kingma and Welling, 2014] to lift the source of randomness of Fourier components to another space which can be independently sampled, so that stochastic gradient of the kernel parameters can be analytically derived. We develop a well-founded underlying theory for our method, including a general way to reparameterize the kernel, and a new tighter error bound on the approximation quality. This view further inspires a direct application of stochastic gradient descent for updating our model under an online learning setting. We then conducted extensive experiments on several large-scale datasets where we demonstrate that our work achieves state-of-the-art performance in both learning efficacy and efficiency. }, FILE = { :tu_etal_ijcai17_rrf - Large Scale Online Kernel Learning with Random Feature Reparameterization.pdf:PDF }, LOCATION = { Melbourne, Australia }, NUMPAGES = { 7 }, URL = { https://www.ijcai.org/proceedings/2017/354 }, }

Column Networks for Collective Classification
Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In The Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017. [ | | pdf]
Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals.

@CONFERENCE { pham_etal_aaai17column, AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Column Networks for Collective Classification }, BOOKTITLE = { The Thirty-First AAAI Conference on Artificial Intelligence (AAAI) }, YEAR = { 2017 }, ABSTRACT = { Relational learning deals with data that are characterized by relational structures. An important task is collective classification, which is to jointly classify networked objects. While it holds a great promise to produce a better accuracy than non-collective classifiers, collective classification is computational challenging and has not leveraged on the recent breakthroughs of deep learning. We present Column Network (CLN), a novel deep learning model for collective classification in multi-relational domains. CLN has many desirable theoretical properties: (i) it encodes multi-relations between any two instances; (ii) it is deep and compact, allowing complex functions to be approximated at the network level with a small set of free parameters; (iii) local and relational features are learned simultaneously; (iv) long-range, higher-order dependencies between instances are supported naturally; and (v) crucially, learning and inference are efficient, linear in the size of the network and the number of relations. We evaluate CLN on multiple real-world applications: (a) delay prediction in software projects, (b) PubMed Diabetes publication classification and (c) film genre classification. In all applications, CLN demonstrates a higher accuracy than state-of-the-art rivals. }, COMMENT = { Accepted }, FILE = { :pham_etal_aaai17column - Column Networks for Collective Classification.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.11.14 }, URL = { https://arxiv.org/abs/1609.04508 }, }

Forward-Backward Smoothing for Hidden Markov Models of Point Pattern Data
Nhan Dam, Dinh Phung, Ba-Ngu Vo and Viet Huynh. In 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 252-261, Tokyo, Japan, October 2017. [ | ]

@INPROCEEDINGS { dam_etal_dsaa17forward, TITLE = { Forward-Backward Smoothing for Hidden {M}arkov Models of Point Pattern Data }, AUTHOR = { Nhan Dam and Dinh Phung and Ba-Ngu Vo and Viet Huynh }, BOOKTITLE = { 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA) }, MONTH = { October }, YEAR = { 2017 }, PAGES = { 252-261 }, ADDRESS = { Tokyo, Japan }, FILE = { :dam_etal_dsaa17forward - Forward Backward Smoothing for Hidden Markov Models of Point Pattern Data.pdf:PDF }, OWNER = { ndam }, TIMESTAMP = { 2017.08.28 }, }

Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring
Hung Nguyen, Sarah J. Maclagan, Tu Dinh Nguyen, Thin Nguyen, Paul Flemons, Kylie Andrews, Euan G. Ritchie and Dinh Phung. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2017. (Honorable Mention Application Paper). [ | ]
Efficient and reliable monitoring of wild animals in their natural habitats is essential to inform conservation and management decisions. Automatic covert cameras or “camera traps” are being an increasingly popular tool for wildlife monitoring due to their effectiveness and reliability in collecting data of wildlife unobtrusively, continuously and in large volume. However, processing such a large volume of images and videos captured from camera traps manually is extremely expensive, time-consuming and also monotonous. This presents a major obstacle to scientists and ecologists to monitor wildlife in an open environment. Leveraging on recent advances in deep learning techniques in computer vision, we propose in this paper a framework to build automated animal recognition in the wild, aiming at an automated wildlife monitoring system. In particular, we use a single-labeled dataset from Wildlife Spotter project, done by citizen scientists, and the state-of-the-art deep convolutional neural network architectures, to train a computational system capable of filtering animal images and identifying species automatically. Our experimental results achieved an accuracy at 96.6% for the task of detecting images containing animal, and 90.4% for identifying the three most common species among the set of images of wild animals taken in South-central Victoria, Australia, demonstrating the feasibility of building fully automated wildlife observation. This, in turn, can therefore speed up research findings, construct more efficient citizen sciencebased monitoring systems and subsequent management decisions, having the potential to make significant impacts to the world of ecology and trap camera images analysis.

@INPROCEEDINGS { hung_etal_dsaa17animal, AUTHOR = { Hung Nguyen and Sarah J. Maclagan and Tu Dinh Nguyen and Thin Nguyen and Paul Flemons and Kylie Andrews and Euan G. Ritchie and Dinh Phung }, TITLE = { Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring }, BOOKTITLE = { Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA) }, YEAR = { 2017 }, NOTE = { Honorable Mention Application Paper }, ABSTRACT = { Efficient and reliable monitoring of wild animals in their natural habitats is essential to inform conservation and management decisions. Automatic covert cameras or “camera traps” are being an increasingly popular tool for wildlife monitoring due to their effectiveness and reliability in collecting data of wildlife unobtrusively, continuously and in large volume. However, processing such a large volume of images and videos captured from camera traps manually is extremely expensive, time-consuming and also monotonous. This presents a major obstacle to scientists and ecologists to monitor wildlife in an open environment. Leveraging on recent advances in deep learning techniques in computer vision, we propose in this paper a framework to build automated animal recognition in the wild, aiming at an automated wildlife monitoring system. In particular, we use a single-labeled dataset from Wildlife Spotter project, done by citizen scientists, and the state-of-the-art deep convolutional neural network architectures, to train a computational system capable of filtering animal images and identifying species automatically. Our experimental results achieved an accuracy at 96.6% for the task of detecting images containing animal, and 90.4% for identifying the three most common species among the set of images of wild animals taken in South-central Victoria, Australia, demonstrating the feasibility of building fully automated wildlife observation. This, in turn, can therefore speed up research findings, construct more efficient citizen sciencebased monitoring systems and subsequent management decisions, having the potential to make significant impacts to the world of ecology and trap camera images analysis. }, FILE = { :hung_etal_dsaa17animal - Animal Recognition and Identification with Deep Convolutional Neural Networks for Automated Wildlife Monitoring.pdf:PDF }, OWNER = { hung }, TIMESTAMP = { 2017.08.28 }, }

Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features
Nguyen, Thin, Nguyen, Duc Thanh, Larsen, Mark E., O'Dea, Bridianne, Yearwood, John, Phung, Dinh, Venkatesh, Svetha and Christensen, Helen. In Proceedings of the International Conference on World Wide Web (WWW), 2017. [ | | pdf]
From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting â€œinsufficient sleepâ€, a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coefficients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels.

@INPROCEEDINGS { nguyen_etal_www17prediction, AUTHOR = { Nguyen, Thin and Nguyen, Duc Thanh and Larsen, Mark E. and O'Dea, Bridianne and Yearwood, John and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen }, TITLE = { Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features }, BOOKTITLE = { Proceedings of the International Conference on World Wide Web (WWW) }, YEAR = { 2017 }, ABSTRACT = { From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting â€œinsufficient sleepâ€, a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at mid-level and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coefficients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels. }, FILE = { :nguyen_etal_www17prediction - Prediction of Population Health Indices from Social Media Using Kernel Based Textual and Temporal Features.pdf:PDF }, OWNER = { thinng }, TIMESTAMP = { 2017.03.25 }, URL = { http://dl.acm.org/citation.cfm?id=3054136 }, }

Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health-related Communities
Bo Dao, Thin Nguyen, Svetha Venkatesh and Dinh Phung. International Journal of Data Science and Analytics, 4:209–-231, November 2017. [ | | pdf]
Social media are an online means of interaction among individuals. People are increasingly using social media, especially online communities, to discuss health concerns and seek support. Understanding topics, sentiment, and structures of these communities informs important aspects of health-related conditions. There has been growing research interest in analyzing online mental health communities; however analysis of these communities with health concerns has been limited. This paper investigate and identify latent meta-groups of online communities with and without mental health-related conditions including depression and autism. Large datasets from online communities were crawled. We analyse both sentiment-based, psycholinguistic-based and topic-based features from blog posts made by members of these online communities. The work focuses on using nonparametric methods to infer latent topics automatically from the corpus of affective words in the blog posts. The visualization of the discovered meta-communities in their use of latent topics shows a difference between the groups. This presents evidence of the emotion-bearing difference in online mental health-related communities, suggesting a possible angle for support and intervention. The methodology might offer potential machine learning techniques for research and practice in psychiatry.

@ARTICLE { Dao_etal_17Latent, AUTHOR = { Bo Dao and Thin Nguyen and Svetha Venkatesh and Dinh Phung }, TITLE = { Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health-related Communities }, JOURNAL = { International Journal of Data Science and Analytics }, YEAR = { 2017 }, VOLUME = { 4 }, PAGES = { 209–-231 }, MONTH = { November }, ABSTRACT = { Social media are an online means of interaction among individuals. People are increasingly using social media, especially online communities, to discuss health concerns and seek support. Understanding topics, sentiment, and structures of these communities informs important aspects of health-related conditions. There has been growing research interest in analyzing online mental health communities; however analysis of these communities with health concerns has been limited. This paper investigate and identify latent meta-groups of online communities with and without mental health-related conditions including depression and autism. Large datasets from online communities were crawled. We analyse both sentiment-based, psycholinguistic-based and topic-based features from blog posts made by members of these online communities. The work focuses on using nonparametric methods to infer latent topics automatically from the corpus of affective words in the blog posts. The visualization of the discovered meta-communities in their use of latent topics shows a difference between the groups. This presents evidence of the emotion-bearing difference in online mental health-related communities, suggesting a possible angle for support and intervention. The methodology might offer potential machine learning techniques for research and practice in psychiatry. }, FILE = { :Dao_etal_17Latent - Latent Sentiment Topic Modelling and Nonparametric Discovery of Online Mental Health Related Communities.pdf:PDF }, OWNER = { thinng }, TIMESTAMP = { 2017.08.31 }, URL = { https://link.springer.com/article/10.1007/s41060-017-0073-y }, }

Invalid BibTex Entry!

Estimating support scores of autism communities in large-scale Web information systems
Thin Nguyen, Hung Nguyen, Svetha Venkatesh and Dinh Phung. In Proceedings of the International Conference on Web Information Systems Engineering (WISE)Springer, , 2017. [ | ]
Individuals with Autism Spectrum Disorder (ASD) have been shown to prefer communication at a socio-spatial distance. So while rarely found in the real world, autism communities are popular in Web-based forums, convenient for people with ASD to seek and share health related information. Reddit is one such avenue for people of common interest to connect, forming communities of specific interest, namely subreddits. This work aims to estimate support scores provided by a popular subreddit interested in ASD – www.reddit.com/r/aspergers. The scores were measured in both the quantities and qualities of the conversations in the forum, including conversational involvement, emotional, and informational support. The support scores of the subreddit Aspergers was compared with that of an average subreddit derived from entire Reddit, represented by two big corpora of approximately 200 million Reddit posts and 1.66 billion Reddit comments. The ASD subreddit was found to be a supportive community, having far higher support scores than did the average subreddit. Apache Spark, an advanced cluster computing framework, is employed to speed up processing of the large corpora. Scalable machine learning techniques implemented in Spark help discriminate the content made in Aspergers versus other subreddits and automatically discover linguistic predictors of ASD within minutes, providing timely reports.

@INCOLLECTION { Nguyen_etal_17Estimating, AUTHOR = { Thin Nguyen and Hung Nguyen and Svetha Venkatesh and Dinh Phung }, TITLE = { Estimating support scores of autism communities in large-scale Web information systems }, BOOKTITLE = { Proceedings of the International Conference on Web Information Systems Engineering (WISE) }, PUBLISHER = { Springer }, YEAR = { 2017 }, SERIES = { Lecture Notes in Computer Science }, ABSTRACT = { Individuals with Autism Spectrum Disorder (ASD) have been shown to prefer communication at a socio-spatial distance. So while rarely found in the real world, autism communities are popular in Web-based forums, convenient for people with ASD to seek and share health related information. Reddit is one such avenue for people of common interest to connect, forming communities of specific interest, namely subreddits. This work aims to estimate support scores provided by a popular subreddit interested in ASD – www.reddit.com/r/aspergers. The scores were measured in both the quantities and qualities of the conversations in the forum, including conversational involvement, emotional, and informational support. The support scores of the subreddit Aspergers was compared with that of an average subreddit derived from entire Reddit, represented by two big corpora of approximately 200 million Reddit posts and 1.66 billion Reddit comments. The ASD subreddit was found to be a supportive community, having far higher support scores than did the average subreddit. Apache Spark, an advanced cluster computing framework, is employed to speed up processing of the large corpora. Scalable machine learning techniques implemented in Spark help discriminate the content made in Aspergers versus other subreddits and automatically discover linguistic predictors of ASD within minutes, providing timely reports. }, FILE = { :Nguyen_etal_17Estimating - Estimating Support Scores of Autism Communities in Large Scale Web Information Systems.pdf:PDF }, LANGUAGE = { English }, OWNER = { thinng }, TIMESTAMP = { 2017.08.28 }, }

Kernel-based features for predicting population health indices from geocoded social media data
Thin Nguyen, Mark E. Larsen, Bridianne O'Dea, Duc Thanh Nguyen, John Yearwood, Dinh Phung, Svetha Venkatesh and Helen Christensen. Decision Support Systems, 2017. [ | | pdf]
When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels.

@ARTICLE { Nguyen_etal_17Kernel, AUTHOR = { Thin Nguyen and Mark E. Larsen and Bridianne O'Dea and Duc Thanh Nguyen and John Yearwood and Dinh Phung and Svetha Venkatesh and Helen Christensen }, TITLE = { Kernel-based features for predicting population health indices from geocoded social media data }, JOURNAL = { Decision Support Systems }, YEAR = { 2017 }, VOLUME = { 0 }, NUMBER = { 0 }, PAGES = { 1-34 }, ABSTRACT = { When using tweets to predict population health index, due to the large scale of data, an aggregation of tweets by population has been a popular practice in learning features to characterize the population. This would alleviate the computational cost for extracting features on each individual tweet. On the other hand, much information on the population could be lost as the distribution of textual features of a population could be important for identifying the health index of that population. In addition, there could be relationships between features and those relationships could also convey predictive information of the health index. In this paper, we propose mid-level features namely kernel-based features for prediction of health indices of populations from social media data. The kernel-based features are extracted on the distributions of textual features over population tweets and encode the relationships between individual textual features in a kernel function. We implemented our features using three different kernel functions and applied them for two case studies of population health prediction: across-year prediction and across-county prediction. The kernel-based features were evaluated and compared with existing features on a dataset collected from the Behavioral Risk Factor Surveillance System dataset. Experimental results show that the kernel-based features gained significantly higher prediction performance than existing techniques, by up to 16.3%, suggesting the potential and applicability of the proposed features in a wide spectrum of applications on data analytics at population levels. }, FILE = { :Nguyen_etal_17Kernel - Kernel Based Features for Predicting Population Health Indices from Geocoded Social Media Data.pdf:PDF }, OWNER = { thinng }, TIMESTAMP = { 2017.07.01 }, URL = { http://www.sciencedirect.com/science/article/pii/S0167923617301227 }, }

Estimation of the prevalence of adverse drug reactions from social media
Thin Nguyen, Mark Larsen, Bridianne O'Dea, Dinh Phung, Svetha Venkatesh and Helen Christensen. International Journal of Medical Informatics (IJMI), 2017. [ | | pdf]
This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale.

@ARTICLE { nguyen_etal_jmi17estimation, AUTHOR = { Thin Nguyen and Mark Larsen and Bridianne O'Dea and Dinh Phung and Svetha Venkatesh and Helen Christensen }, TITLE = { Estimation of the prevalence of adverse drug reactions from social media }, JOURNAL = { International Journal of Medical Informatics (IJMI) }, YEAR = { 2017 }, PAGES = { 1--17 }, ABSTRACT = { This work aims to estimate the degree of adverse drug reactions (ADR) for psychiatric medications from social media, including Twitter, Reddit, and LiveJournal. Advances in lightning-fast cluster computing was employed to process large scale data, consisting of 6.4 terabytes of data containing 3.8 billion records from all the media. Rates of ADR were quantified using the SIDER database of drugs and side-effects, and an estimated ADR rate was based on the prevalence of discussion in the social media corpora. Agreement between these measures for a sample of ten popular psychiatric drugs was evaluated using the Pearson correlation coefficient, r, with values between 0.08 and 0.50. Word2vec, a novel neural learning framework, was utilized to improve the coverage of variants of ADR terms in the unstructured text by identifying syntactically or semantically similar terms. Improved correlation coefficients, between 0.29 and 0.59, demonstrates the capability of advanced techniques in machine learning to aid in the discovery of meaningful patterns from medical data, and social media data, at scale. }, FILE = { :nguyen_etal_jmi17estimation - Estimation of the Prevalence of Adverse Drug Reactions from Social Media.pdf:PDF }, URL = { http://www.sciencedirect.com/science/article/pii/S1386505617300746 }, }

Invalid BibTex Entry!

Hierarchical semi-Markov conditional random fields for deep recursive sequential data
Truyen Tran, Dinh Phung, Hung H. Bui and Svetha Venkatesh. Artificial Intelligence (AIJ), Feb. 2017. [ | | pdf]
We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.

@ARTICLE { tran_etal_aij17hierarchical, AUTHOR = { Truyen Tran and Dinh Phung and Hung H. Bui and Svetha Venkatesh }, TITLE = { Hierarchical semi-Markov conditional random fields for deep recursive sequential data }, JOURNAL = { Artificial Intelligence (AIJ) }, YEAR = { 2017 }, MONTH = { Feb. }, ABSTRACT = { We present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of linear-chain conditional random fields to model deep nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We develop numerical scaling procedures that handle the overflow problem. We show that the HSCRF can be reduced to the semi-Markov conditional random fields. Finally, we demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. The HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. }, FILE = { :tran_etal_aij17hierarchical - Hierarchical Semi Markov Conditional Random Fields for Deep Recursive Sequential Data.pdf:PDF }, KEYWORDS = { Deep nested sequential processes, Hierarchical semi-Markov conditional random field, Partial labelling, Constrained inference, Numerical scaling }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.02.21 }, URL = { http://www.sciencedirect.com/science/article/pii/S0004370217300231 }, }

See my thesis (chapter 5) for for an equivalent directed graphical model, which is the precusor of this work and where I had described the Assymetric Inside-Outside (AIO) algorithm in great detail. A brief version of this for directed case has also appeared in this AAAI'04's paper. The idea of semi-Markov duration modelling has also been addressed for directed case in these CVPR05 and AIJ09 papers.

Streaming Clustering with Bayesian Nonparametric Models
Viet Huynh and Dinh Phung. Neurocomputing, 258:52-62, October 2017. [ | | pdf]
Bayesian nonparametric (BNP) models are theoretically suitable for learning streaming data due to their complexity relaxation to growing data observed over time. There is a rich body of literature on developing efficient approximate methods for posterior inferences in BNP models, typically dominated by MCMC. However, very limited work has addressed posterior inference in a streaming fashion, which is important to fully realize the potential of BNP models applied to real-world tasks. The main challenge resides in developing one-pass posterior update which is consistent withthe data streamed over time (i.e., data is scanned only once), for which general MCMC methods will fail to address. On the other hand, Dirichlet process-based mixture models are the most fundamental building blocks in the field of BNP. To this end, we develop in this paper a class of variational methods suitable for posterior inference of the Dirichlet process mixture (DPM) models where both the posterior update and data are presented in a streaming setting. We first propose new methods to advance existing variational based inference approaches for BNP to allow the variational distributions growing over time, hence overcoming an important limitation of current methods in imposing parametric, truncated restrictions on the variational distributions. This results in two new methods namely truncation-free variational Bayes (TFVB) and truncation-free maximization expectation (TFME) respectively where the latter further supports hard clustering. These inference methods form the foundation for our streaming inference algorithm where we further adapt the recent Streaming Variational Bayes proposed in [1] to our task. To demonstrate our framework for realworld tasks whose datasets are often heterogeneous, we develop one more theoretical extension for our model to handle assorted data where each observation consists of different data types. Our experiments with automatically learning the number of clusters demonstrate the comparable inference capability of our framework in comparison with truncated version variational inference algorithms for both synthetic and real-world datasets. Moreover, an evaluation of streaming learning algorithms with text corpora reveals both quantitative and qualitative efficacy of the algorithms on clustering documents.

@ARTICLE { huynh_phung_neuro17streaming, AUTHOR = { Viet Huynh and Dinh Phung }, TITLE = { Streaming Clustering with Bayesian Nonparametric Models }, JOURNAL = { Neurocomputing }, YEAR = { 2017 }, VOLUME = { 258 }, PAGES = { 52--62 }, MONTH = { October }, ISSN = { 0925-2312 }, ABSTRACT = { Bayesian nonparametric (BNP) models are theoretically suitable for learning streaming data due to their complexity relaxation to growing data observed over time. There is a rich body of literature on developing efficient approximate methods for posterior inferences in BNP models, typically dominated by MCMC. However, very limited work has addressed posterior inference in a streaming fashion, which is important to fully realize the potential of BNP models applied to real-world tasks. The main challenge resides in developing one-pass posterior update which is consistent withthe data streamed over time (i.e., data is scanned only once), for which general MCMC methods will fail to address. On the other hand, Dirichlet process-based mixture models are the most fundamental building blocks in the field of BNP. To this end, we develop in this paper a class of variational methods suitable for posterior inference of the Dirichlet process mixture (DPM) models where both the posterior update and data are presented in a streaming setting. We first propose new methods to advance existing variational based inference approaches for BNP to allow the variational distributions growing over time, hence overcoming an important limitation of current methods in imposing parametric, truncated restrictions on the variational distributions. This results in two new methods namely truncation-free variational Bayes (TFVB) and truncation-free maximization expectation (TFME) respectively where the latter further supports hard clustering. These inference methods form the foundation for our streaming inference algorithm where we further adapt the recent Streaming Variational Bayes proposed in [1] to our task. To demonstrate our framework for realworld tasks whose datasets are often heterogeneous, we develop one more theoretical extension for our model to handle assorted data where each observation consists of different data types. Our experiments with automatically learning the number of clusters demonstrate the comparable inference capability of our framework in comparison with truncated version variational inference algorithms for both synthetic and real-world datasets. Moreover, an evaluation of streaming learning algorithms with text corpora reveals both quantitative and qualitative efficacy of the algorithms on clustering documents. }, FILE = { :huynh_phung_neuro17streaming - Streaming Clustering with Bayesian Nonparametric Models.pdf:PDF }, KEYWORDS = { streaming learning, Bayesian nonparametric, variational Bayes inference, Dirichlet process, Dirichlet process mixtures, heterogeneous data sources }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.02.18 }, URL = { http://www.sciencedirect.com/science/article/pii/S0925231217304253 }, }

Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions
Budhaditya Saha, Sunil Gupta, Dinh Phung and Svetha Venkatesh. Knowledge and Information Systems (KAIS), 2017. [ | | pdf]
Electronic Medical Records (EMR) are being increasingly used for “risk” prediction.By “risks”, we denote outcomes such as emergency presentation, readmission, thelength of hospitalizations etc. However, EMR data analysis is complicated by missing entries.There are two reasons - the “primary reason for admission” is included in EMR, but thecomorbidities (other chronic diseases) are left uncoded, and, many zero values in the dataare accurate, reflecting that a patient has not accessed medical facilities. A key challenge isto deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflectingthe fact that patients have some, but not all diseases. We propose a novel model to fill-inthese missing values and use the new representation for prediction of key hospital events. To“fill-in” missing values, we represent the feature-patient matrix as a product of two low-rankfactors, preserving the sparsity property in the product. Intuitively, the product regularizationallows sparse imputation of patient conditions reflecting common comorbidities acrosspatients. We develop a scalable optimization algorithm based on Block coordinate descentmethod to find an optimal solution. We evaluate the proposed framework on two real worldEMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions).Our result shows that the AUC for 3 months emergency presentation prediction isimproved significantly from (0.729 to 0.741) for Cancer data and (0.699 to 0.723) for AMIdata. Similarly, AUC for 3 months emergency admission prediction from (0.730 to 0.752)for Cancer data and (0.682 to 0.724) for AMI data. We also extend the proposed method toa supervised model for predicting multiple related risk outcomes (e.g. emergency presentationsand admissions in hospital over 3, 6 and 12 months period) in an integrated framework.The supervised model consistently outperforms state-of-the-art baseline methods.

@ARTICLE { budhaditya_gupta_phung_venkatesh_kais17effective, AUTHOR = { Budhaditya Saha and Sunil Gupta and Dinh Phung and Svetha Venkatesh }, TITLE = { Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions }, JOURNAL = { Knowledge and Information Systems (KAIS) }, YEAR = { 2017 }, ABSTRACT = { Electronic Medical Records (EMR) are being increasingly used for “risk” prediction.By “risks”, we denote outcomes such as emergency presentation, readmission, thelength of hospitalizations etc. However, EMR data analysis is complicated by missing entries.There are two reasons - the “primary reason for admission” is included in EMR, but thecomorbidities (other chronic diseases) are left uncoded, and, many zero values in the dataare accurate, reflecting that a patient has not accessed medical facilities. A key challenge isto deal with the peculiarities of this data - unlike many other datasets, EMR is sparse, reflectingthe fact that patients have some, but not all diseases. We propose a novel model to fill-inthese missing values and use the new representation for prediction of key hospital events. To“fill-in” missing values, we represent the feature-patient matrix as a product of two low-rankfactors, preserving the sparsity property in the product. Intuitively, the product regularizationallows sparse imputation of patient conditions reflecting common comorbidities acrosspatients. We develop a scalable optimization algorithm based on Block coordinate descentmethod to find an optimal solution. We evaluate the proposed framework on two real worldEMR cohorts: Cancer (7000 admissions) and Acute Myocardial Infarction (2652 admissions).Our result shows that the AUC for 3 months emergency presentation prediction isimproved significantly from (0.729 to 0.741) for Cancer data and (0.699 to 0.723) for AMIdata. Similarly, AUC for 3 months emergency admission prediction from (0.730 to 0.752)for Cancer data and (0.682 to 0.724) for AMI data. We also extend the proposed method toa supervised model for predicting multiple related risk outcomes (e.g. emergency presentationsand admissions in hospital over 3, 6 and 12 months period) in an integrated framework.The supervised model consistently outperforms state-of-the-art baseline methods. }, FILE = { :budhaditya_gupta_phung_venkatesh_kais17effective - Effective Sparse Imputation of Patient Conditions in Electronic Medical Records for Emergency Risk Predictions.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.17 }, URL = { https://link.springer.com/article/10.1007/s10115-017-1038-0 }, }

Energy-Based Localized Anomaly Detection in Video Surveillance
Hung Vu, Tu Dinh Nguyen, Anthony Travers, Svetha Venkatesh and Dinh Phung. In The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Jeju, South Korea, May 23-26 2017. (Best Application Paper Award). [ | | pdf]
Automated detection of abnormal events in video surveillance is an important task in research and practical applications. This is, however, a challenging problem due to the growing collection of data without the knowledge of what to be defined as “abnormal”, and the expensive feature engineering procedure. In this paper we introduce a unified framework for anomaly detection in video based on the restricted Boltzmann machine (RBM), a recent powerful method for unsupervised learning and representation learning. Our proposed system works directly on the image pixels rather than hand-crafted features, it learns new representations for data in a completely unsupervised manner without the need for labels, and then reconstructs the data to recognize the locations of abnormal events based on the reconstruction errors. More importantly, our approach can be deployed in both offline and streaming settings, in which trained parameters of the model are fixed in offline setting whilst are updated incrementally with video data arriving in a stream. Experiments on three publicly benchmark video datasets show that our proposed method can detect and localize the abnormalities at pixel level with better accuracy than those of baselines, and achieve competitive performance compared with state-of-the-art approaches. Moreover, as RBM belongs to a wider class of deep generative models, our framework lays the groundwork towards a more powerful deep unsupervised abnormality detection framework.

@INPROCEEDINGS { vu_etal_pakdd17energy, AUTHOR = { Hung Vu and Tu Dinh Nguyen and Anthony Travers and Svetha Venkatesh and Dinh Phung }, TITLE = { Energy-Based Localized Anomaly Detection in Video Surveillance }, BOOKTITLE = { The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2017 }, EDITOR = { Jinho Kim, Kyuseok Shim, Longbing Cao, Jae-Gil Lee, Xuemin Lin, Yang-Sae Moon }, ADDRESS = { Jeju, South Korea }, MONTH = { May 23-26 }, NOTE = { Best Application Paper Award }, ABSTRACT = { Automated detection of abnormal events in video surveillance is an important task in research and practical applications. This is, however, a challenging problem due to the growing collection of data without the knowledge of what to be defined as “abnormal”, and the expensive feature engineering procedure. In this paper we introduce a unified framework for anomaly detection in video based on the restricted Boltzmann machine (RBM), a recent powerful method for unsupervised learning and representation learning. Our proposed system works directly on the image pixels rather than hand-crafted features, it learns new representations for data in a completely unsupervised manner without the need for labels, and then reconstructs the data to recognize the locations of abnormal events based on the reconstruction errors. More importantly, our approach can be deployed in both offline and streaming settings, in which trained parameters of the model are fixed in offline setting whilst are updated incrementally with video data arriving in a stream. Experiments on three publicly benchmark video datasets show that our proposed method can detect and localize the abnormalities at pixel level with better accuracy than those of baselines, and achieve competitive performance compared with state-of-the-art approaches. Moreover, as RBM belongs to a wider class of deep generative models, our framework lays the groundwork towards a more powerful deep unsupervised abnormality detection framework. }, FILE = { :vu_etal_pakdd17energy - Energy Based Localized Anomaly Detection in Video Surveillance.pdf:PDF }, OWNER = { hungv }, TIMESTAMP = { 2017.01.31 }, URL = { https://link.springer.com/chapter/10.1007/978-3-319-57454-7_50 }, }

2016

One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems
Nguyen, Vu, Nguyen, Tu Dinh, Le, Trung, Phung, Dinh and Venkatesh, Svetha. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1113-1118, Dec 2016. [ | | pdf | code]
Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters.

@CONFERENCE { nguyen_etal_icdm16onepass, AUTHOR = { Nguyen, Vu and Nguyen, Tu Dinh and Le, Trung and Phung, Dinh and Venkatesh, Svetha }, TITLE = { One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems }, BOOKTITLE = { 2016 IEEE 16th International Conference on Data Mining (ICDM) }, YEAR = { 2016 }, PAGES = { 1113-1118 }, MONTH = { Dec }, ABSTRACT = { Logistic regression (LR) for classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for label-drift and large-scale classifications. To handle largescale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable orbetter than those of state-of-the-art baselines whilst the executiontime is much faster at an order of magnitude. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validationto select optimal hyperparameters. }, CODE = { https://github.com/ntienvu/ICDM2016_OLR }, DOI = { 10.1109/ICDM.2016.0145 }, FILE = { :nguyen_etal_icdm16onepass - One Pass Logistic Regression for Label Drift and Large Scale Classification on Distributed Systems.pdf:PDF }, KEYWORDS = { Big Data;distributed processing;pattern classification;regression analysis;Big Data cross-validation;Spark-OLR;class labels;data shuffling;distributed systems;execution time;label-drift classification problem;large-scale classification;large-scale datasets;one-pass logistic regression;optimal hyperparameter selection;sparkling OLR;Bayes methods;Big data;Context;Data models;Estimation;Industries;Logistics;Apache Spark;Logistic regression;distributed system;label-drift;large-scale classification }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.09.10 }, URL = { http://ieeexplore.ieee.org/document/7837958/ }, }

Dual Space Gradient Descent for Online Learning
Le, Trung, Nguyen, Tu Dinh, Nguyen, Vu and Phung, Dinh. In Advances in Neural Information Processing (NIPS), December 2016. [ | | pdf]
One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines.

@CONFERENCE { le_etal_nips16dual, AUTHOR = { Le, Trung and Nguyen, Tu Dinh and Nguyen, Vu and Phung, Dinh }, TITLE = { Dual Space Gradient Descent for Online Learning }, BOOKTITLE = { Advances in Neural Information Processing (NIPS) }, YEAR = { 2016 }, MONTH = { December }, ABSTRACT = { One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive computation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines. }, FILE = { :le_etal_nips16dual - Dual Space Gradient Descent for Online Learning.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.08.16 }, URL = { https://papers.nips.cc/paper/6560-dual-space-gradient-descent-for-online-learning.pdf }, }

Scalable Nonparametric Bayesian Multilevel Clustering
Viet Huynh, Dinh Phung, Svetha Venkatesh, Xuan-Long Nguyen, Matt Hoffman and Hung Bui. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), pages 289-298, June 2016. [ | | pdf]

@CONFERENCE { huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable, AUTHOR = { Viet Huynh and Dinh Phung and Svetha Venkatesh and Xuan-Long Nguyen and Matt Hoffman and Hung Bui }, TITLE = { Scalable Nonparametric {B}ayesian Multilevel Clustering }, BOOKTITLE = { Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2016 }, MONTH = { June }, PUBLISHER = { AUAI Pres }, PAGES = { 289--298 }, FILE = { :huynh_phung_venkatesh_nguyen_hoffman_bui_uai16scalable - Scalable Nonparametric Bayesian Multilevel Clustering.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.09 }, URL = { http://auai.org/uai2016/proceedings/papers/262.pdf }, }

Budgeted Semi-supervised Support Vector Machine
Le, Trung, Duong, Phuong, Dinh, Mi, Nguyen, Tu, Nguyen, Vu and Phung, Dinh. In 32nd Conference on Uncertainty in Artificial Intelligence (UAI), June 2016. [ | | pdf]

@CONFERENCE { le_duong_dinh_nguyen_nguyen_phung_uai16budgeted, AUTHOR = { Le, Trung and Duong, Phuong and Dinh, Mi and Nguyen, Tu and Nguyen, Vu and Phung, Dinh }, TITLE = { Budgeted Semi-supervised {S}upport {V}ector {M}achine }, BOOKTITLE = { 32nd Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2016 }, MONTH = { June }, FILE = { :le_duong_dinh_nguyen_nguyen_phung_uai16budgeted - Budgeted Semi Supervised Support Vector Machine.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.09 }, URL = { http://auai.org/uai2016/proceedings/papers/110.pdf }, }

Nonparametric Budgeted Stochastic Gradient Descent
Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS), May 2016. [ | | pdf]

@CONFERENCE { le_nguyen_phung_aistats16nonparametric, AUTHOR = { Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh }, TITLE = { Nonparametric Budgeted Stochastic Gradient Descent }, BOOKTITLE = { 19th Intl. Conf. on Artificial Intelligence and Statistics (AISTATS) }, YEAR = { 2016 }, MONTH = { May }, FILE = { :le_nguyen_phung_aistats16nonparametric - Nonparametric Budgeted Stochastic Gradient Descent.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://www.jmlr.org/proceedings/papers/v51/le16.pdf }, }

Introduction: special issue of selected papers from ACML 2014
Dinh Phung and Hang Li and Tru Cao and Tu-Bao Ho and Zhi-Hua Zhou, editor. volume 103, Springer, May 2016. [ | | pdf]

@PROCEEDINGS { li_phung_cao_ho_zhou_acml14_selectedpapers, TITLE = { Introduction: special issue of selected papers from {ACML} 2014 }, YEAR = { 2016 }, EDITOR = { Dinh Phung and Hang Li and Tru Cao and Tu-Bao Ho and Zhi-Hua Zhou }, VOLUME = { 103 }, NUMBER = { 2 }, PUBLISHER = { Springer }, MONTH = { May }, FILE = { :li_phung_cao_ho_zhou_acml14_selectedpapers - Introduction_ Special Issue of Selected Papers from ACML 2014.pdf:PDF }, ISSN = { 1573-0565 }, JOURNAL = { Machine Learning }, OWNER = { Thanh-Binh Nguyen }, PAGES = { 137--139 }, TIMESTAMP = { 2016.04.11 }, URL = { http://dx.doi.org/10.1007/s10994-016-5549-9 }, }

Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View
Luo, Wei, Phung, Dinh, Tran, Truyen, Gupta, Sunil, Rana, Santu, Karmakar, Chandan, Shilton, Alistair, Yearwood, John, Dimitrova, Nevenka, Ho, Bao Tu, Venkatesh, Svetha and Berk, Michael. J Med Internet Res, 18(12):e323, Dec 2016. [ | | pdf]
Background: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. Objective: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. Methods: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. Results: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. Conclusions: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community.

@ARTICLE { Luo_etal_jmir16guidelines, AUTHOR = { Luo, Wei and Phung, Dinh and Tran, Truyen and Gupta, Sunil and Rana, Santu and Karmakar, Chandan and Shilton, Alistair and Yearwood, John and Dimitrova, Nevenka and Ho, Bao Tu and Venkatesh, Svetha and Berk, Michael }, TITLE = { Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View }, JOURNAL = { J Med Internet Res }, YEAR = { 2016 }, VOLUME = { 18 }, NUMBER = { 12 }, PAGES = { e323 }, MONTH = { Dec }, ABSTRACT = { Background: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. Objective: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. Methods: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. Results: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. Conclusions: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community. }, DAY = { 16 }, DOI = { 10.2196/jmir.5870 }, FILE = { :Luo_etal_jmir16guidelines - Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research_ a Multidisciplinary View.pdf:PDF }, KEYWORDS = { machine learning, clinical prediction rule, guideline }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.12.21 }, URL = { http://www.jmir.org/2016/12/e323/ }, }

Data Clustering Using Side Information Dependent Chinese Restaurant Processes
Li, Cheng, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 47(2):463-488, May 2016. [ | | pdf]
Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach.

@ARTICLE { li_rana_phung_venkatesh_kais16, AUTHOR = { Li, Cheng and Rana, Santu and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Data Clustering Using Side Information Dependent {C}hinese Restaurant Processes }, JOURNAL = { Knowledge and Information Systems (KAIS) }, YEAR = { 2016 }, VOLUME = { 47 }, NUMBER = { 2 }, PAGES = { 463--488 }, MONTH = { May }, ABSTRACT = { Side information, or auxiliary information associated with documents or image content, provides hints for clustering. We propose a new model, side information dependent Chinese restaurant process, which exploits side information in a Bayesian nonparametric model to improve data clustering. We introduce side information into the framework of distance dependent Chinese restaurant process using a robust decay function to handle noisy side information. The threshold parameter of the decay function is updated automatically in the Gibbs sampling process. A fast inference algorithm is proposed. We evaluate our approach on four datasets: Cora, 20 Newsgroups, NUS-WIDE and one medical dataset. Types of side information explored in this paper include citations, authors, tags, keywords and auxiliary clinical information. The comparison with the state-of-the-art approaches based on standard performance measures (NMI, F1) clearly shows the superiority of our approach. }, DOI = { 10.1007/s10115-015-0834-7 }, FILE = { :li_rana_phung_venkatesh_kais16 - Data Clustering Using Side Information Dependent Chinese Restaurant Processes.pdf:PDF }, KEYWORDS = { Side information Similarity Data clustering Bayesian nonparametric models }, OWNER = { Dinh }, TIMESTAMP = { 2015.03.02 }, URL = { http://link.springer.com/article/10.1007/s10115-015-0834-7 }, }

Multiple Kernel Learning with Data Augmentation
Nguyen, Khanh, Le, Trung, Nguyen, Vu, Nguyen, Tu Dinh and Phung, Dinh. In 8th Asian Conference on Machine Learning (ACML), Nov. 2016. [ | ]

@CONFERENCE { nguyen_etal_acml16multiple, AUTHOR = { Nguyen, Khanh and Le, Trung and Nguyen, Vu and Nguyen, Tu Dinh and Phung, Dinh }, TITLE = { Multiple Kernel Learning with Data Augmentation }, BOOKTITLE = { 8th Asian Conference on Machine Learning (ACML) }, YEAR = { 2016 }, MONTH = { Nov. }, FILE = { :nguyen_etal_acml16multiple - Multiple Kernel Learning with Data Augmentation.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.07.13 }, }

Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious
Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In Advances in Artificial Intelligence, pages 455-468.Springer, , 2016. (Student travel award). [ | | pdf]
Data scientists, with access to fast growing data and computing power, constantly look for algorithms with greater detection power to discover “novel” knowledge. But more often than not, their algorithms give them too many outputs that are either highly speculative or simply confirming what the domain experts already know. To escape this dilemma, we need algorithms that move beyond the obvious association analyses and leverage domain analytic objectives (aka. KPIs) to look for higher order connections. We propose a new technique Exceptional Contrast Set Mining that first gathers a succinct collection of affirmative contrast sets based on the principle of redundant information elimination. Then it discovers exceptional contrast sets that contradict the affirmative contrast sets. The algorithm has been successfully applied to several analytic consulting projects. In particular, during an analysis of a state-wide cancer registry, it discovered a surprising regional difference in breast cancer screening.

@INCOLLECTION { nguyen_etal_ai16exceptional, AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Exceptional Contrast Set Mining: Moving Beyond the Deluge of the Obvious }, BOOKTITLE = { Advances in Artificial Intelligence }, PUBLISHER = { Springer }, YEAR = { 2016 }, VOLUME = { 9992 }, PAGES = { 455--468 }, NOTE = { Student travel award }, ABSTRACT = { Data scientists, with access to fast growing data and computing power, constantly look for algorithms with greater detection power to discover “novel” knowledge. But more often than not, their algorithms give them too many outputs that are either highly speculative or simply confirming what the domain experts already know. To escape this dilemma, we need algorithms that move beyond the obvious association analyses and leverage domain analytic objectives (aka. KPIs) to look for higher order connections. We propose a new technique Exceptional Contrast Set Mining that first gathers a succinct collection of affirmative contrast sets based on the principle of redundant information elimination. Then it discovers exceptional contrast sets that contradict the affirmative contrast sets. The algorithm has been successfully applied to several analytic consulting projects. In particular, during an analysis of a state-wide cancer registry, it discovered a surprising regional difference in breast cancer screening. }, FILE = { :nguyen_etal_ai16exceptional - Exceptional Contrast Set Mining_ Moving beyond the Deluge of the Obvious.pdf:PDF }, GROUPS = { Contrast Set Mining }, ORGANIZATION = { Springer }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.01.05 }, URL = { http://link.springer.com/chapter/10.1007/978-3-319-50127-7_39 }, }

SECC: Simultaneous extraction of context and community from pervasive signals
Nguyen, T., Nguyen, V., Salim, F.D. and Phung, D.. In IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM), pages 1-9, March 2016. [ | | pdf]
Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture highorder and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels. We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated.

@INPROCEEDINGS { nguyen_nguyen_salim_phung_percom16secc, AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Phung, D. }, TITLE = { {SECC}: Simultaneous extraction of context and community from pervasive signals }, BOOKTITLE = { IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM) }, YEAR = { 2016 }, PAGES = { 1-9 }, MONTH = { March }, ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture highorder and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels. We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated. }, DOI = { 10.1109/PERCOM.2016.7456501 }, FILE = { :nguyen_nguyen_salim_phung_percom16secc - SECC_ Simultaneous Extraction of Context and Community from Pervasive Signals.pdf:PDF }, KEYWORDS = { Bluetooth;Context;Context modeling;Data mining;Data models;Feature extraction;Mixture models }, URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7456501 }, }

Nonparametric discovery of movement patterns from accelerometer signals
Nguyen, T., Gupta, S., Venkatesh, S. and Phung, D.. Pattern Recognition Letters, 70(C):52-58, Jan. 2016. [ | | pdf]
Monitoring daily physical activity plays an important role in disease prevention and intervention. This paper proposes an approach to monitor the body movement intensity levels from accelerometer data. We collect the data using the accelerometer in a realistic setting without any supervision. The ground-truth of activities is provided by the participants themselves using an experience sampling application running on their mobile phones. We compute a novel feature that has a strong correlation with the movement intensity. We use the hierarchical Dirichlet process (HDP) model to detect the activity levels from this feature. Consisting of Bayesian nonparametric priors over the parameters the model can infer the number of levels automatically. By demonstrating the approach on the publicly available USC-HAD dataset that includes ground-truth activity labels, we show a strong correlation between the discovered activity levels and the movement intensity of the activities. This correlation is further confirmed using our newly collected dataset. We further use the extracted patterns as features for clustering and classifying the activity sequences to improve performance.

@ARTICLE { nguyen_gupta_venkatesh_phung_pr16nonparametric, AUTHOR = { Nguyen, T. and Gupta, S. and Venkatesh, S. and Phung, D. }, TITLE = { Nonparametric discovery of movement patterns from accelerometer signals }, JOURNAL = { Pattern Recognition Letters }, YEAR = { 2016 }, VOLUME = { 70 }, NUMBER = { C }, PAGES = { 52--58 }, MONTH = { Jan. }, ISSN = { 0167-8655 }, ABSTRACT = { Monitoring daily physical activity plays an important role in disease prevention and intervention. This paper proposes an approach to monitor the body movement intensity levels from accelerometer data. We collect the data using the accelerometer in a realistic setting without any supervision. The ground-truth of activities is provided by the participants themselves using an experience sampling application running on their mobile phones. We compute a novel feature that has a strong correlation with the movement intensity. We use the hierarchical Dirichlet process (HDP) model to detect the activity levels from this feature. Consisting of Bayesian nonparametric priors over the parameters the model can infer the number of levels automatically. By demonstrating the approach on the publicly available USC-HAD dataset that includes ground-truth activity labels, we show a strong correlation between the discovered activity levels and the movement intensity of the activities. This correlation is further confirmed using our newly collected dataset. We further use the extracted patterns as features for clustering and classifying the activity sequences to improve performance. }, DOI = { http://dx.doi.org/10.1016/j.patrec.2015.11.003 }, FILE = { :nguyen_gupta_venkatesh_phung_pr16nonparametric - Nonparametric Discovery of Movement Patterns from Accelerometer Signals.pdf:PDF }, KEYWORDS = { Accelerometer, Activity recognition, Bayesian nonparametric, Dirichlet process, Movement intensity }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.10 }, URL = { http://www.sciencedirect.com/science/article/pii/S016786551500389X }, }

Preterm Birth Prediction: Stable Selection of Interpretable Rules from High Dimensional Data
Tran, Truyen, Luo, Wei, Phung, Dinh, Morris, Jonathan, Rickard, Kristen and Venkatesh, Svetha. In Proceedings of the 1st Machine Learning for Healthcare Conference, pages 164-177, 2016. [ | | pdf]
Preterm births occur at an alarming rate of 10-15%. Preemies have a higher risk of infant mortality, developmental retardation and long-term disabilities. Predicting preterm birth is difficult, even for the most experienced clinicians. The most well-designed clinical study thus far reaches a modest sensitivity of 18.2–24.2% at specificity of 28.6–33.3%. We take a different approach by exploiting databases of normal hospital operations. We aims are twofold: (i) to derive an easy-to-use, interpretable prediction rule with quantified uncertainties, and (ii) to construct accurate classifiers for preterm birth prediction. Our approach is to automatically generate and select from hundreds (if not thousands) of possible predictors using stability-aware techniques. Derived from a large database of 15,814 women, our simplified prediction rule with only 10 items has sensitivity of 62.3% at specificity of 81.5%.

@INPROCEEDINGS { tran_etal_mlhc16pretern, AUTHOR = { Tran, Truyen and Luo, Wei and Phung, Dinh and Morris, Jonathan and Rickard, Kristen and Venkatesh, Svetha }, TITLE = { Preterm Birth Prediction: Stable Selection of Interpretable Rules from High Dimensional Data }, BOOKTITLE = { Proceedings of the 1st Machine Learning for Healthcare Conference }, YEAR = { 2016 }, EDITOR = { Finale Doshi-Velez, Jim Fackler, David Kale, Byron Wallace, Jenna Weins }, VOLUME = { 56 }, SERIES = { JMLR Workshop and Conference Proceedings }, PAGES = { 164--177 }, PUBLISHER = { JMLR }, ABSTRACT = { Preterm births occur at an alarming rate of 10-15%. Preemies have a higher risk of infant mortality, developmental retardation and long-term disabilities. Predicting preterm birth is difficult, even for the most experienced clinicians. The most well-designed clinical study thus far reaches a modest sensitivity of 18.2–24.2% at specificity of 28.6–33.3%. We take a different approach by exploiting databases of normal hospital operations. We aims are twofold: (i) to derive an easy-to-use, interpretable prediction rule with quantified uncertainties, and (ii) to construct accurate classifiers for preterm birth prediction. Our approach is to automatically generate and select from hundreds (if not thousands) of possible predictors using stability-aware techniques. Derived from a large database of 15,814 women, our simplified prediction rule with only 10 items has sensitivity of 62.3% at specificity of 81.5%. }, FILE = { :tran_etal_mlhc16pretern - Preterm Birth Prediction_ Stable Selection of Interpretable Rules from High Dimensional Data.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.11.02 }, URL = { http://jmlr.org/proceedings/papers/v56/Tran16.html }, }

Computer Assisted Autism Interventions for India
Vellanki, Pratibha, Greenhill, Stewart, Duong, Thi, Phung, Dinh, Venkatesh, Svetha, Godwin, Jayashree, Achary, Kishna V. and Varkey, Blessin. In Proceedings of the 28th Australian Conference on Computer-Human Interaction, pages 618-622, New York, NY, USA, 2016. [ | | pdf]

@INPROCEEDINGS { vellanki_etal_ozchi16computer, AUTHOR = { Vellanki, Pratibha and Greenhill, Stewart and Duong, Thi and Phung, Dinh and Venkatesh, Svetha and Godwin, Jayashree and Achary, Kishna V. and Varkey, Blessin }, TITLE = { Computer Assisted Autism Interventions for {I}ndia }, BOOKTITLE = { Proceedings of the 28th Australian Conference on Computer-Human Interaction }, YEAR = { 2016 }, SERIES = { OzCHI '16 }, PAGES = { 618--622 }, ADDRESS = { New York, NY, USA }, PUBLISHER = { ACM }, ACMID = { 3011007 }, DOI = { 10.1145/3010915.3011007 }, FILE = { :vellanki_etal_ozchi16computer - Computer Assisted Autism Interventions for India.pdf:PDF }, ISBN = { 978-1-4503-4618-4 }, KEYWORDS = { Hindi, India, assistive technology, autism, early intervention, translation }, LOCATION = { Launceston, Tasmania, Australia }, NUMPAGES = { 5 }, URL = { http://doi.acm.org/10.1145/3010915.3011007 }, }

A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process
Nguyen, T., Nguyen, V., Salim, F.D., Le, D.V. and Phung, D.. Pervasive and Mobile Computing (PMC), 2016. [ | | pdf]
Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated.

@ARTICLE { nguyen_nguyen_flora_le_phung_pmc16simultaneous, AUTHOR = { Nguyen, T. and Nguyen, V. and Salim, F.D. and Le, D.V. and Phung, D. }, TITLE = { A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested {D}irichlet Process }, JOURNAL = { Pervasive and Mobile Computing (PMC) }, YEAR = { 2016 }, ABSTRACT = { Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as a way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture high-order and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to summarize data at multiple levels. We demonstrate our framework on five datasets where the advantages of the proposed approach are validated. }, DOI = { http://dx.doi.org/10.1016/j.pmcj.2016.08.019 }, FILE = { :nguyen_nguyen_flora_le_phung_pmc16simultaneous - A Simultaneous Extraction of Context and Community from Pervasive Signals Using Nested Dirichlet Process.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.08.17 }, URL = { http://www.sciencedirect.com/science/article/pii/S1574119216302097 }, }

Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data
Vellanki, Pratibha, Duong, Thi, Gupta, Sunil, Venkatesh, Svetha and Phung, Dinh. Knowledge and Information Systems (KAIS), 2016. [ | | pdf]
The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning.

@ARTICLE { vellanki_etal_kis16nonparametric, AUTHOR = { Vellanki, Pratibha and Duong, Thi and Gupta, Sunil and Venkatesh, Svetha and Phung, Dinh }, TITLE = { Nonparametric discovery and analysis of learning patterns and autism subgroups from therapeutic data }, JOURNAL = { Knowledge and Information Systems (KAIS) }, YEAR = { 2016 }, PAGES = { 1--31 }, ISSN = { 0219-3116 }, ABSTRACT = { The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning. }, DOI = { 10.1007/s10115-016-0971-7 }, FILE = { :vellanki_etal_kis16nonparametric - Nonparametric Discovery and Analysis of Learning Patterns and Autism Subgroups from Therapeutic Data.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.08.02 }, URL = { http://dx.doi.org/10.1007/s10115-016-0971-7 }, }

Forecasting Daily Patient Outflow From a Ward Having No Real-Time Clinical Data
Gopakumar, Shivapratap, Tran, Truyen, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. JMIR Med Inform, 4(3):e25, Jul 2016. [ | | pdf]
Objective: Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data. Methods: We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features. Results: Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7\% improvement in mean absolute error, for all days in the year 2014. Conclusions: In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments.

@ARTICLE { gopakumar_etal_jmir16forecasting, AUTHOR = { Gopakumar, Shivapratap and Tran, Truyen and Luo, Wei and Phung, Dinh and Venkatesh, Svetha }, JOURNAL = { JMIR Med Inform }, TITLE = { Forecasting Daily Patient Outflow From a Ward Having No Real-Time Clinical Data }, YEAR = { 2016 }, MONTH = { Jul }, NUMBER = { 3 }, PAGES = { e25 }, VOLUME = { 4 }, ABSTRACT = { Objective: Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data. Methods: We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features. Results: Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7\% improvement in mean absolute error, for all days in the year 2014. Conclusions: In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments. }, DAY = { 21 }, DOI = { 10.2196/medinform.5650 }, FILE = { :gopakumar_etal_jmir16forecasting - Forecasting Daily Patient Outflow from a Ward Having No Real Time Clinical Data.pdf:PDF }, KEYWORDS = { patient flow }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.08.02 }, URL = { http://medinform.jmir.org/2016/3/e25/ }, }

Control Matching via Discharge Code Sequences
Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In Machine Learning for Health @ NIPS 2016, 2016. [ | ]
In this paper, we consider the patient similarity matching problem over a cancer cohort of more than 220,000 patients. Our approach first leverages on Word2Vec framework to embed ICD codes into vector-valued representation. We then propose a sequential algorithm for case-control matching on this representation space of diagnosis codes. The novel practice of applying the sequential matching on the vector representation lifted the matching accuracy measured through multiple clinical outcomes. We reported the results on a large-scale dataset to demonstrate the effectiveness of our method. For such a large dataset where most clinical information has been codified, the new method is particularly relevant.

@CONFERENCE { nguyen_etal_mlh16control, AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Control Matching via Discharge Code Sequences }, BOOKTITLE = { Machine Learning for Health @ NIPS 2016 }, YEAR = { 2016 }, ABSTRACT = { In this paper, we consider the patient similarity matching problem over a cancer cohort of more than 220,000 patients. Our approach first leverages on Word2Vec framework to embed ICD codes into vector-valued representation. We then propose a sequential algorithm for case-control matching on this representation space of diagnosis codes. The novel practice of applying the sequential matching on the vector representation lifted the matching accuracy measured through multiple clinical outcomes. We reported the results on a large-scale dataset to demonstrate the effectiveness of our method. For such a large dataset where most clinical information has been codified, the new method is particularly relevant. }, FILE = { :nguyen_etal_mlh16control - Control Matching Via Discharge Code Sequences.pdf:PDF }, JOURNAL = { arXiv preprint arXiv:1612.01812 }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2017.02.06 }, }

Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community
Dao, Bo, Nguyen, Thin, Venkatesh, Svetha and Phung, Dinh. In 12th IEEE-RIVF Intl. Conf. on Computing and Communication Technologies, Nov. 2016. (Best Runner-up Student Paper Award). [ | ]
Social capital is linked to mental illness. It has been proposed that higher social capital is associated with better mental well-being in both individuals and groups in offline setting. However, in online settings, the association between onlinesocial capital and mental health conditions has not yet been explored. Social media offer us a rich opportunity to determine the link between social capital and aspects of mental wellbeing. In this paper, we examine social capital based on levelsof social connectivity of bloggers can be connected to aspects of depression in individuals and online depression community. We explore apparent properties of textual contents, including expressed emotions, language styles and latent topics, of a largecorpus of blog posts, to analyze the aspect of social capital in the community. Using data collected from online LiveJournal depression community, we apply both statistical tests and machine learning approaches to examine how predictive factors varybetween low and high social capital groups. Significant differences are found between low and high social capital groups when characterized by a set of latent topics, language features derived from blog posts, suggesting discriminative features, proved tobe useful in the classification task. This shows that linguistic styles are better predictors than latent topics as features. The findings indicate the potential of using social media as a sensor for monitoring mental well-being in online settings.

@CONFERENCE { dao_etal_rivf16effect, AUTHOR = { Dao, Bo and Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh }, TITLE = { Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community }, BOOKTITLE = { 12th IEEE-RIVF Intl. Conf. on Computing and Communication Technologies }, YEAR = { 2016 }, MONTH = { Nov. }, NOTE = { Best Runner-up Student Paper Award }, ABSTRACT = { Social capital is linked to mental illness. It has been proposed that higher social capital is associated with better mental well-being in both individuals and groups in offline setting. However, in online settings, the association between onlinesocial capital and mental health conditions has not yet been explored. Social media offer us a rich opportunity to determine the link between social capital and aspects of mental wellbeing. In this paper, we examine social capital based on levelsof social connectivity of bloggers can be connected to aspects of depression in individuals and online depression community. We explore apparent properties of textual contents, including expressed emotions, language styles and latent topics, of a largecorpus of blog posts, to analyze the aspect of social capital in the community. Using data collected from online LiveJournal depression community, we apply both statistical tests and machine learning approaches to examine how predictive factors varybetween low and high social capital groups. Significant differences are found between low and high social capital groups when characterized by a set of latent topics, language features derived from blog posts, suggesting discriminative features, proved tobe useful in the classification task. This shows that linguistic styles are better predictors than latent topics as features. The findings indicate the potential of using social media as a sensor for monitoring mental well-being in online settings. }, FILE = { :dao_etal_rivf16effect - Effect of Social Capital on Emotion, Language Style and Latent Topics in Online Depression Community.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.09.10 }, }

MCNC: Multi-channel Nonparametric Clustering from Heterogeneous Data
Nguyen, T-B., Nguyen, V., Venkatesh, S. and Phung, D.. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. (Finalist Best IBM Track 1 Student Paper Award). [ | ]
Bayesian nonparametric (BNP) models have recently become popular due to its flexibility in identifying the unknown number of clusters. However, they have difficulties handling heterogeneous data from multiple sources. Existing BNP works either treat each of these sources independently -- hence do not benefit from the correlating information between them, or require to specify data sources as primary or context channels. In this paper, we present a BNP framework, termed MCNC, which has the ability to (1) discover co-patterns from multiple sources; (2) explore multi-channel data simultaneously and equally; (3) automatically identify a suitable number of patterns from data; and (4) handle missing data. The key idea is to tweak the base measure of a BNP model being a product-space. We demonstrate our framework on synthetic and real-world datasets to discover the identity--location--time (a.k.a who--where--when) patterns in two settings of complete and missing data. The experimenal results highlight the effectiveness of our MCNC in both cases of complete and missing data.

@CONFERENCE { nguyen_nguyen_venkatesh_phung_icpr16mcnc, AUTHOR = { Nguyen, T-B. and Nguyen, V. and Venkatesh, S. and Phung, D. }, TITLE = { {MCNC}: Multi-channel Nonparametric Clustering from Heterogeneous Data }, BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) }, YEAR = { 2016 }, MONTH = { Dec. }, NOTE = { Finalist Best IBM Track 1 Student Paper Award }, ABSTRACT = { Bayesian nonparametric (BNP) models have recently become popular due to its flexibility in identifying the unknown number of clusters. However, they have difficulties handling heterogeneous data from multiple sources. Existing BNP works either treat each of these sources independently -- hence do not benefit from the correlating information between them, or require to specify data sources as primary or context channels. In this paper, we present a BNP framework, termed MCNC, which has the ability to (1) discover co-patterns from multiple sources; (2) explore multi-channel data simultaneously and equally; (3) automatically identify a suitable number of patterns from data; and (4) handle missing data. The key idea is to tweak the base measure of a BNP model being a product-space. We demonstrate our framework on synthetic and real-world datasets to discover the identity--location--time (a.k.a who--where--when) patterns in two settings of complete and missing data. The experimenal results highlight the effectiveness of our MCNC in both cases of complete and missing data. }, FILE = { :nguyen_nguyen_venkatesh_phung_icpr16mcnc - MCNC_ Multi Channel Nonparametric Clustering from Heterogeneous Data.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.07.13 }, }

Stable Clinical Prediction using Graph Support Vector Machines
Kamkar, Iman, Gupta, Sunil, Li, Cheng, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]

@CONFERENCE { kamkar_gupta_li_phung_venkatesh_icpr16stable, AUTHOR = { Kamkar, Iman and Gupta, Sunil and Li, Cheng and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Stable Clinical Prediction using Graph {S}upport {V}ector {M}achines }, BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) }, YEAR = { 2016 }, MONTH = { Dec. }, FILE = { :kamkar_gupta_li_phung_venkatesh_icpr16stable - Stable Clinical Prediction Using Graph Support Vector Machines.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.07.13 }, }

Distributed Data Augmented Support Vector Machine on Spark
Nguyen, Tu, Nguyen, Vu, Le, Trung and Phung, Dinh. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]

@CONFERENCE { nguyen_nguyen_le_phung_icpr16distributed, AUTHOR = { Nguyen, Tu and Nguyen, Vu and Le, Trung and Phung, Dinh }, TITLE = { Distributed Data Augmented {S}upport {V}ector {M}achine on {S}park }, BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) }, YEAR = { 2016 }, MONTH = { Dec. }, FILE = { :nguyen_nguyen_le_phung_icpr16distributed - Distributed Data Augmented Support Vector Machine on Spark.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.07.13 }, }

Faster Training of Very Deep Networks via p-Norm Gates
Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]

@CONFERENCE { pham_tran_phung_venkatesh_icpr16faster, AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Faster Training of Very Deep Networks via p-Norm Gates }, BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) }, YEAR = { 2016 }, MONTH = { Dec. }, FILE = { :pham_tran_phung_venkatesh_icpr16faster - Faster Training of Very Deep Networks Via P Norm Gates.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.07.13 }, }

Transfer Learning for Rare Cancer Problems via Discriminative Sparse Gaussian Graphical Model
Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]

@CONFERENCE { budhaditya_gupta_phung_venkatesh_icpr16transfer, AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Transfer Learning for Rare Cancer Problems via Discriminative Sparse {G}aussian Graphical Model }, BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) }, YEAR = { 2016 }, MONTH = { Dec. }, FILE = { :budhaditya_gupta_phung_venkatesh_icpr16transfer - Transfer Learning for Rare Cancer Problems Via Discriminative Sparse Gaussian Graphical Model.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.07.13 }, }

Model-based Classification and Novelty Detection For Point Pattern Data
Vo, Ba-Ngu, Tran, Nhat-Quang, Phung, Dinh and Vo, Ba-Tuong. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]

@CONFERENCE { vo_tran_phung_vo_icpr16model, AUTHOR = { Vo, Ba-Ngu and Tran, Nhat-Quang and Phung, Dinh and Vo, Ba-Tuong }, TITLE = { Model-based Classification and Novelty Detection For Point Pattern Data }, BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) }, YEAR = { 2016 }, MONTH = { Dec. }, FILE = { :vo_tran_phung_vo_icpr16model - Model Based Classification and Novelty Detection for Point Pattern Data.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.07.13 }, }

Clustering For Point Pattern Data
Tran, Nhat-Quang, Vo, Ba-Ngu, Phung, Dinh and Vo, Ba-Tuong. In 23rd Intl. Conf. on Pattern Recognition (ICPR), Dec. 2016. [ | ]

@CONFERENCE { tran_vo_phung_vo_icpr16clustering, AUTHOR = { Tran, Nhat-Quang and Vo, Ba-Ngu and Phung, Dinh and Vo, Ba-Tuong }, TITLE = { Clustering For Point Pattern Data }, BOOKTITLE = { 23rd Intl. Conf. on Pattern Recognition (ICPR) }, YEAR = { 2016 }, MONTH = { Dec. }, FILE = { :tran_vo_phung_vo_icpr16clustering - Clustering for Point Pattern Data.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.07.13 }, }

Discriminative cues for different stages of smoking cessation in online community
Nguyen, Thin, Borland, Ron, Yearwood, John, Yong, Hua, Venkatesh, Svetha and Phung, Dinh. In 17th Intl. Conf. on Web Information Systems Engineering (WISE), Nov. 2016. [ | ]
Smoking is the largest single cause of premature mortality, being responsible for about six million deaths annually worldwide. Most smokers want to quit, but many have problems. The Internet enables people interested in quitting smoking to connect with others via online communities; however, the characteristics of these discussions are not well understood. This work aims to explore the textual cues of an online community interested in quitting smoking: www.reddit.com/r/stopsmoking -- “a place for redditors to motivate each other to quit smoking”. A large corpus of data was crawled including thousand posts made by thousand users within the community. Four subgroups of posts based on the cessation days of abstainers were defined: S0: within the first week, S1: within the first month (excluding cohort S0), S2: from second month to one year, and S3: beyond one year. Psycho-linguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the first week S0 from the other subgroups. Topics and psycho-linguistic features were found to be highly valid predictors of the subgroups. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in studies of smoking and other addictions in online settings.

@INPROCEEDINGS { nguyen_etal_wise16discriminative, AUTHOR = { Nguyen, Thin and Borland, Ron and Yearwood, John and Yong, Hua and Venkatesh, Svetha and Phung, Dinh }, TITLE = { Discriminative cues for different stages of smoking cessation in online community }, BOOKTITLE = { 17th Intl. Conf. on Web Information Systems Engineering (WISE) }, YEAR = { 2016 }, SERIES = { Lecture Notes in Computer Science }, MONTH = { Nov. }, PUBLISHER = { Springer International Publishing }, ABSTRACT = { Smoking is the largest single cause of premature mortality, being responsible for about six million deaths annually worldwide. Most smokers want to quit, but many have problems. The Internet enables people interested in quitting smoking to connect with others via online communities; however, the characteristics of these discussions are not well understood. This work aims to explore the textual cues of an online community interested in quitting smoking: www.reddit.com/r/stopsmoking -- “a place for redditors to motivate each other to quit smoking”. A large corpus of data was crawled including thousand posts made by thousand users within the community. Four subgroups of posts based on the cessation days of abstainers were defined: S0: within the first week, S1: within the first month (excluding cohort S0), S2: from second month to one year, and S3: beyond one year. Psycho-linguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the first week S0 from the other subgroups. Topics and psycho-linguistic features were found to be highly valid predictors of the subgroups. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in studies of smoking and other addictions in online settings. }, FILE = { :nguyen_etal_wise16discriminative - Discriminative Cues for Different Stages of Smoking Cessation in Online Community.pdf:PDF }, LANGUAGE = { English }, OWNER = { thinng }, TIMESTAMP = { 2016.07.14 }, }

Large-scale stylistic analysis of formality in academia and social media
Nguyen, Thin, Venkatesh, Svetha and Phung, Dinh. In 17th Intl. Conf. on Web Information Systems Engineering (WISE), Nov. 2016. [ | ]
The dictum `publish or perish' has influenced the way scientists present research results as to get published, including exaggeration and overstatement of research findings. This behavior emerges patterns of using language in academia. For example, recently it has been found that the proportion of positive words has risen in the content of scientific articles over the last 40 years, which probably shows the tendency in scientists to exaggerate and overstate their research results. The practice may deviate from impersonal and formal style of academic writing. In this study the degree of formality in scientific articles is investigated through a corpus of 14 million PubMed abstracts. Three aspects of stylistic features are explored: expressing emotional information, using first person pronouns to refer to the authors, and mixing English varieties. The aspects are compared with that of online user-generated media, including online encyclopedias, web-logs, forums, and micro-blogs. Trends of these stylistic features in scientific publications for the last four decades are also discovered. Advances in cluster computing are employed to process large scale data, with 5.8 terabytes and 3.6 billions of data points from all the media. The results suggest the potential of pattern recognition in data at scale.

@INPROCEEDINGS { nguyen_etal_wise16LargeScale, AUTHOR = { Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh }, TITLE = { Large-scale stylistic analysis of formality in academia and social media }, BOOKTITLE = { 17th Intl. Conf. on Web Information Systems Engineering (WISE) }, YEAR = { 2016 }, SERIES = { Lecture Notes in Computer Science }, MONTH = { Nov. }, PUBLISHER = { Springer International Publishing }, ABSTRACT = { The dictum `publish or perish' has influenced the way scientists present research results as to get published, including exaggeration and overstatement of research findings. This behavior emerges patterns of using language in academia. For example, recently it has been found that the proportion of positive words has risen in the content of scientific articles over the last 40 years, which probably shows the tendency in scientists to exaggerate and overstate their research results. The practice may deviate from impersonal and formal style of academic writing. In this study the degree of formality in scientific articles is investigated through a corpus of 14 million PubMed abstracts. Three aspects of stylistic features are explored: expressing emotional information, using first person pronouns to refer to the authors, and mixing English varieties. The aspects are compared with that of online user-generated media, including online encyclopedias, web-logs, forums, and micro-blogs. Trends of these stylistic features in scientific publications for the last four decades are also discovered. Advances in cluster computing are employed to process large scale data, with 5.8 terabytes and 3.6 billions of data points from all the media. The results suggest the potential of pattern recognition in data at scale. }, FILE = { :nguyen_etal_wise16LargeScale - Large Scale Stylistic Analysis of Formality in Academia and Social Media.pdf:PDF }, LANGUAGE = { English }, OWNER = { thinng }, TIMESTAMP = { 2016.07.14 }, }

Learning Multifaceted Latent Activities from Heterogeneous Mobile Data
Nguyen, Thanh-Binh, Nguyen, Vu, Nguyen, Thuong, Venkatesh, Svetha, Kumar, Mohan and Phung, Dinh. In 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA), Oct. 2016. [ | ]
Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, none of existing methods has addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudenfLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods.

@INPROCEEDINGS { nguyen_etal_dsaa16learning, AUTHOR = { Nguyen, Thanh-Binh and Nguyen, Vu and Nguyen, Thuong and Venkatesh, Svetha and Kumar, Mohan and Phung, Dinh }, TITLE = { Learning Multifaceted Latent Activities from Heterogeneous Mobile Data }, BOOKTITLE = { 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA) }, YEAR = { 2016 }, MONTH = { Oct. }, ABSTRACT = { Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, none of existing methods has addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudenfLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods. }, FILE = { :nguyen_etal_dsaa16learning - Learning Multifaceted Latent Activities from Heterogeneous Mobile Data.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.08.01 }, }

Analysing the History of Autism Spectrum Disorder using Topic Models
Beykikhoshk, Adham, Arandjelovi\'{c}, Ognjen, Venkatesh, Svetha and Phung, Dinh. In 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA), Oct. 2016. [ | ]
We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data, as the topic nuances in science result in new scientific directions to emerge, a new approach to model the longitudinal literature data is using topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the they are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examining two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to aid other researchers by analysing the results or applying the model to their data collections.

@INPROCEEDINGS { beykikhoshk_etal_dsaa16analysing, AUTHOR = { Beykikhoshk, Adham and Arandjelovi\'{c}, Ognjen and Venkatesh, Svetha and Phung, Dinh }, TITLE = { Analysing the History of Autism Spectrum Disorder using Topic Models }, BOOKTITLE = { 3rd Intl. Conf. on Data Science and Advanced Analytics (DSAA) }, YEAR = { 2016 }, MONTH = { Oct. }, ABSTRACT = { We describe a novel framework for the discovery of underlying topics of a longitudinal collection of scholarly data, and the tracking of their lifetime and popularity over time. Unlike the social media or news data, as the topic nuances in science result in new scientific directions to emerge, a new approach to model the longitudinal literature data is using topics which remain identifiable over the course of time. Current studies either disregard the time dimension or treat it as an exchangeable covariate when they fix the topics over time or do not share the topics over epochs when they model the time naturally. We address these issues by adopting a non-parametric Bayesian approach. We assume the data is partially exchangeable and divide it into consecutive epochs. Then, by fixing the topics in a recurrent Chinese restaurant franchise, we impose a static topical structure on the corpus such that the they are shared across epochs and the documents within epochs. We demonstrate the effectiveness of the proposed framework on a collection of medical literature related to autism spectrum disorder. We collect a large corpus of publications and carefully examining two important research issues of the domain as case studies. Moreover, we make the results of our experiment and the source code of the model, freely available to aid other researchers by analysing the results or applying the model to their data collections. }, FILE = { :beykikhoshk_etal_dsaa16analysing - Analysing the History of Autism Spectrum Disorder Using Topic Models.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.08.01 }, }

A Framework for Mixed-type Multi-outcome Prediction with Applications in Healthcare
Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. IEEE Journal of Biomedical and Health Informatics (JBHI), July 2016. [ | ]

@ARTICLE { budhaditya_gupta_phung_venkatesh_jbhi16framework, AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha }, TITLE = { A Framework for Mixed-type Multi-outcome Prediction with Applications in Healthcare }, JOURNAL = { IEEE Journal of Biomedical and Health Informatics (JBHI) }, YEAR = { 2016 }, MONTH = { July }, FILE = { :budhaditya_gupta_phung_venkatesh_jbhi16framework - A Framework for Mixed Type Multi Outcome Prediction with Applications in Healthcare.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.07.13 }, }

Discovering Latent Affective Transitions among Individuals in Online Mental Healthrelated Communities.
Dao, Bo, Thin Nguyen, Venkatesh, Svetha and Phung, Dinh. In IEEE Intl. Conf. on Multimedia and Expo (ICME), Seatle, USA, July 2016. [ | ]
The discovery of latent affective patterns of individuals with affective disorders will potentially enhance the diagnosis and treatment of mental disorders. This paper studies the phenomena of affective transitions among individuals in online mental health communities. We apply non-negative matrix factorization model to extract the common and individual factors of affective transitions across groups of individuals in different levels of affective disorders. We examine the latent patterns of emotional transitions and investigate the effects of emotional transitions across the cohorts. We establish a novel framework of utilizing social media as sensors of mood and emotional transitions. This work might suggest the base of new systems to screen individuals and communities at high risks of mental health problems in online settings.

@INPROCEEDINGS { dao_nguyen_venkatesh_phung_icme16, AUTHOR = { Dao, Bo and Thin Nguyen and Venkatesh, Svetha and Phung, Dinh }, TITLE = { Discovering Latent Affective Transitions among Individuals in Online Mental Healthrelated Communities. }, BOOKTITLE = { IEEE Intl. Conf. on Multimedia and Expo (ICME) }, YEAR = { 2016 }, ADDRESS = { Seatle, USA }, MONTH = { July }, PUBLISHER = { IEEE }, ABSTRACT = { The discovery of latent affective patterns of individuals with affective disorders will potentially enhance the diagnosis and treatment of mental disorders. This paper studies the phenomena of affective transitions among individuals in online mental health communities. We apply non-negative matrix factorization model to extract the common and individual factors of affective transitions across groups of individuals in different levels of affective disorders. We examine the latent patterns of emotional transitions and investigate the effects of emotional transitions across the cohorts. We establish a novel framework of utilizing social media as sensors of mood and emotional transitions. This work might suggest the base of new systems to screen individuals and communities at high risks of mental health problems in online settings. }, FILE = { :dao_nguyen_venkatesh_phung_icme16 - Discovering Latent Affective Transitions among Individuals in Online Mental Healthrelated Communities..pdf:PDF }, OWNER = { dbdao }, TIMESTAMP = { 2016.03.20 }, }

Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records
Li, Cheng, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. Knowledge-Based Systems (KBS), 99(1):168 - 182, May 2016. [ | | pdf]
Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital \{EMR\} dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from \{EMR\} data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using \{MCMC\} technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets – PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy.

@ARTICLE { li_rana_phung_venkatesh_kbs16hierarchical, AUTHOR = { Li, Cheng and Rana, Santu and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Hierarchical {B}ayesian nonparametric models for knowledge discovery from electronic medical records }, JOURNAL = { Knowledge-Based Systems (KBS) }, YEAR = { 2016 }, VOLUME = { 99 }, NUMBER = { 1 }, PAGES = { 168 - 182 }, MONTH = { May }, ISSN = { 0950-7051 }, ABSTRACT = { Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital \{EMR\} dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from \{EMR\} data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using \{MCMC\} technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets – PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy. }, DOI = { http://dx.doi.org/10.1016/j.knosys.2016.02.005 }, FILE = { :li_rana_phung_venkatesh_kbs16hierarchical - Hierarchical Bayesian Nonparametric Models for Knowledge Discovery from Electronic Medical Records.pdf:PDF }, KEYWORDS = { Bayesian nonparametric models; Correspondence models; Word distances; Disease topics; Readmission prediction; Procedure codes prediction }, URL = { http://www.sciencedirect.com/science/article/pii/S0950705116000836 }, }

Learning Multi-faceted Activities from Heterogeneous Data with the Product Space Hierarchical Dirichlet Processes
Nguyen, T-B., Nguyen, V., Venkatesh, S. and Phung, D.. In 3rd PAKDD Workshop on Machine Learning for Sensory Data Analysis (MLSDA), pages 128-140, April 2016. [ | ]
Hierarchical Dirichlet processes (HDP) was originally designed and experimented for a single data channel. In this paper we enhanced its ability to model heterogeneous data using a richer structure for the base measure being a product-space. The enhanced model, called Product Space HDP (PS-HDP), can (1) simultaneously model heterogeneous data from multiple sources in a Bayesian nonparametric framework, hence inherit its strengths and advantages including the ability to automatically grow the model complexity and (2) discover multilevel latent structures from data to result in different types of topics/latent structures that can be explained jointly. We experimented with the MDC dataset, a large and real-world data collected from mobile phones. Our goal was to discover identity--location--time (a.k.a who-where-when) patterns at different levels (globally for all groups and locally for each group). We provided analysis on the activities and patterns learned from our model, visualized, compared and contrasted with the ground-truth to demonstrate the merit of the proposed framework. We further quantitatively evaluated and reported its performance using standard metrics including F1-score, NMI, RI, and purity. We also compared the performance of the PS-HDP model with those of popular existing clustering methods (including K-Means, NNMF, GMM, DP-Means, and AP). Lastly, we demonstrate the ability of the model in learning activities with missing data, a common problem encountered in pervasive and ubiquitous computing applications.

@INPROCEEDINGS { nguyen_nguyen_venkatesh_phung_mlsda16learning, AUTHOR = { Nguyen, T-B. and Nguyen, V. and Venkatesh, S. and Phung, D. }, TITLE = { Learning Multi-faceted Activities from Heterogeneous Data with the Product Space Hierarchical {D}irichlet Processes }, BOOKTITLE = { 3rd PAKDD Workshop on Machine Learning for Sensory Data Analysis (MLSDA) }, YEAR = { 2016 }, PAGES = { 128--140 }, MONTH = { April }, ABSTRACT = { Hierarchical Dirichlet processes (HDP) was originally designed and experimented for a single data channel. In this paper we enhanced its ability to model heterogeneous data using a richer structure for the base measure being a product-space. The enhanced model, called Product Space HDP (PS-HDP), can (1) simultaneously model heterogeneous data from multiple sources in a Bayesian nonparametric framework, hence inherit its strengths and advantages including the ability to automatically grow the model complexity and (2) discover multilevel latent structures from data to result in different types of topics/latent structures that can be explained jointly. We experimented with the MDC dataset, a large and real-world data collected from mobile phones. Our goal was to discover identity--location--time (a.k.a who-where-when) patterns at different levels (globally for all groups and locally for each group). We provided analysis on the activities and patterns learned from our model, visualized, compared and contrasted with the ground-truth to demonstrate the merit of the proposed framework. We further quantitatively evaluated and reported its performance using standard metrics including F1-score, NMI, RI, and purity. We also compared the performance of the PS-HDP model with those of popular existing clustering methods (including K-Means, NNMF, GMM, DP-Means, and AP). Lastly, we demonstrate the ability of the model in learning activities with missing data, a common problem encountered in pervasive and ubiquitous computing applications. }, FILE = { :nguyen_nguyen_venkatesh_phung_mlsda16learning - Learning Multi Faceted Activities from Heterogeneous Data with the Product Space Hierarchical Dirichlet Processes.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, }

Neural Choice by Elimination via Highway Networks
Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In 5th PAKDD Workshop on Biologically Inspired Data Mining Techniques, April 2016. [ | ]
We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods.

@INPROCEEDINGS { tran_phung_venkatesh_bmd16neural, AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Neural Choice by Elimination via Highway Networks }, BOOKTITLE = { 5th PAKDD Workshop on Biologically Inspired Data Mining Techniques }, YEAR = { 2016 }, MONTH = { April }, ABSTRACT = { We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods. }, FILE = { :tran_phung_venkatesh_bmd16neural - Neural Choice by Elimination Via Highway Networks.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, }

DeepCare: A Deep Dynamic Memory Model for Predictive Medicine
Pham, Trang, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 30-41, April 2016. [ | | pdf]
Personalized predictive medicine necessitates modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, a deep dynamic neural network that reads medical records and predicts future medical outcomes. At the data level, DeepCare models patient health state trajectories with explicit memory of illness. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling and readmission prediction in diabetes, a chronic disease with large economic burden. The results show improved modeling and risk prediction accuracy.

@CONFERENCE { pham_tran_phung_venkatesh_pakdd16deepcare, AUTHOR = { Pham, Trang and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { {DeepCare}: A Deep Dynamic Memory Model for Predictive Medicine }, BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) }, YEAR = { 2016 }, VOLUME = { 9652 }, SERIES = { Lecture Notes in Computer Science }, PAGES = { 30--41 }, MONTH = { April }, PUBLISHER = { Springer International Publishing }, ABSTRACT = { Personalized predictive medicine necessitates modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, a deep dynamic neural network that reads medical records and predicts future medical outcomes. At the data level, DeepCare models patient health state trajectories with explicit memory of illness. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timing by moderating the forgetting and consolidation of illness memory. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling and readmission prediction in diabetes, a chronic disease with large economic burden. The results show improved modeling and risk prediction accuracy. }, DOI = { 10.1007/978-3-319-31750-2_3 }, FILE = { :pham_tran_phung_venkatesh_pakdd16deepcare - DeepCare_ a Deep Dynamic Memory Model for Predictive Medicine.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://link.springer.com/chapter/10.1007/978-3-319-31750-2_3 }, }

Sparse Adaptive Multi-Hyperplane Machine
Nguyen, Khanh, Le, Trung, Nguyen, Vu and Phung, Dinh. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 27-39, April 2016. [ | | pdf]
The Adaptive Multiple-hyperplane Machine (AMM) was recently proposed to deal with large-scale datasets. However, it has no principle to tune the complexity and sparsity levels of the solution. Addressing the sparsity is important to improve learning generalization, prediction accuracy and computational speedup. In this paper, we employ the max-margin principle and sparse approach to propose a new Sparse AMM (SAMM). We solve the new optimization objective function with stochastic gradient descent (SGD). Besides inheriting the good features of SGD-based learning method and the original AMM, our proposed Sparse AMM provides machinery and flexibility to tune the complexity and sparsity of the solution, making it possible to avoid overfitting and underfitting. We validate our approach on several large benchmark datasets. We show that with the ability to control sparsity, the proposed Sparse AMM yields superior classification accuracy to the original AMM while simultaneously achieving computational speedup.

@CONFERENCE { nguyen_le_nguyen_phung_pakdd16sparse, AUTHOR = { Nguyen, Khanh and Le, Trung and Nguyen, Vu and Phung, Dinh }, TITLE = { Sparse Adaptive Multi-Hyperplane Machine }, BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) }, YEAR = { 2016 }, VOLUME = { 9651 }, SERIES = { Lecture Notes in Computer Science }, PAGES = { 27--39 }, MONTH = { April }, PUBLISHER = { Springer International Publishing }, ABSTRACT = { The Adaptive Multiple-hyperplane Machine (AMM) was recently proposed to deal with large-scale datasets. However, it has no principle to tune the complexity and sparsity levels of the solution. Addressing the sparsity is important to improve learning generalization, prediction accuracy and computational speedup. In this paper, we employ the max-margin principle and sparse approach to propose a new Sparse AMM (SAMM). We solve the new optimization objective function with stochastic gradient descent (SGD). Besides inheriting the good features of SGD-based learning method and the original AMM, our proposed Sparse AMM provides machinery and flexibility to tune the complexity and sparsity of the solution, making it possible to avoid overfitting and underfitting. We validate our approach on several large benchmark datasets. We show that with the ability to control sparsity, the proposed Sparse AMM yields superior classification accuracy to the original AMM while simultaneously achieving computational speedup. }, DOI = { 10.1007/978-3-319-31753-3_3 }, FILE = { :nguyen_le_nguyen_phung_pakdd16sparse - Sparse Adaptive Multi Hyperplane Machine.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://link.springer.com/chapter/10.1007/978-3-319-31753-3_3 }, }

Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi-Task Framework
Li, Cheng, Gupta, Sunil, Rana, Santu, Luo, Wei, Venkatesh, Svetha, Ashely, David and Phung, Dinh. In Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD), pages 152-164, April 2016. [ | | pdf]
Treatments of cancer cause severe side effects called toxicities. Reduction of such effects is crucial in cancer care. To impact care, we need to predict toxicities at fortnightly intervals. This toxicity data differs from traditional time series data as toxicities can be caused by one treatment on a given day alone, and thus it is necessary to consider the effect of the singular data vector causing toxicity. We model the data before prediction points using the multiple instance learning, where each bag is composed of multiple instances associated with daily treatments and patient-specific attributes, such as chemotherapy, radiotherapy, age and cancer types. We then formulate a Bayesian multi-task framework to enhance toxicity prediction at each prediction point. The use of the prior allows factors to be shared across task predictors. Our proposed method simultaneously captures the heterogeneity of daily treatments and performs toxicity prediction at different prediction points. Our method was evaluated on a real-word dataset of more than 2000 cancer patients and had achieved a better prediction accuracy in terms of AUC than the state-of-art baselines.

@CONFERENCE { li_gupta_rana_luo_venkatesh_ashley_phung_pakdd16toxicity, AUTHOR = { Li, Cheng and Gupta, Sunil and Rana, Santu and Luo, Wei and Venkatesh, Svetha and Ashely, David and Phung, Dinh }, TITLE = { Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi-Task Framework }, BOOKTITLE = { Pacific Asia Knowledge Discovery and Data Mining Conference (PAKDD) }, YEAR = { 2016 }, PAGES = { 152--164 }, MONTH = { April }, PUBLISHER = { Springer }, ABSTRACT = { Treatments of cancer cause severe side effects called toxicities. Reduction of such effects is crucial in cancer care. To impact care, we need to predict toxicities at fortnightly intervals. This toxicity data differs from traditional time series data as toxicities can be caused by one treatment on a given day alone, and thus it is necessary to consider the effect of the singular data vector causing toxicity. We model the data before prediction points using the multiple instance learning, where each bag is composed of multiple instances associated with daily treatments and patient-specific attributes, such as chemotherapy, radiotherapy, age and cancer types. We then formulate a Bayesian multi-task framework to enhance toxicity prediction at each prediction point. The use of the prior allows factors to be shared across task predictors. Our proposed method simultaneously captures the heterogeneity of daily treatments and performs toxicity prediction at different prediction points. Our method was evaluated on a real-word dataset of more than 2000 cancer patients and had achieved a better prediction accuracy in terms of AUC than the state-of-art baselines. }, DOI = { 10.1007/978-3-319-31753-3_13 }, FILE = { :li_gupta_rana_luo_venkatesh_ashley_phung_pakdd16toxicity - Toxicity Prediction in Cancer Using Multiple Instance Learning in a Multi Task Framework.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://link.springer.com/chapter/10.1007/978-3-319-31753-3_13 }, }

Modelling Human Preferences for Ranking and Collaborative Filtering: A Probabilistic Ordered Partition Approach
Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 47(1):157-188, April 2016. [ | | pdf]

@ARTICLE { tran_phung_venkatesh_kais16, AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Modelling Human Preferences for Ranking and Collaborative Filtering: A Probabilistic Ordered Partition Approach }, JOURNAL = { Knowledge and Information Systems (KAIS) }, YEAR = { 2016 }, VOLUME = { 47 }, NUMBER = { 1 }, PAGES = { 157--188 }, MONTH = { April }, DOI = { 10.1007/s10115-015-0840-9 }, FILE = { :tran_phung_venkatesh_kais16 - Modelling Human Preferences for Ranking and Collaborative Filtering_ a Probabilistic Ordered Partition Approach.pdf:PDF }, KEYWORDS = { Preference learning Learning-to-rank Collaborative filtering Probabilistic ordered partition model Set-based ranking Probabilistic reasoning }, OWNER = { Dinh }, TIMESTAMP = { 2015.03.02 }, URL = { http://link.springer.com/article/10.1007%2Fs10115-015-0840-9 }, }

Consistency of the Health of the Nation Outcome Scales (HoNOS) at inpatient-to-community transition
Luo, Wei, Harvey, Richard, Tran, Truyen, Phung, Dinh, Venkatesh, Svetha and Connor, Jason P. BMJ open, 6(4):e010732, April 2016. [ | | pdf]
Objectives The Health of the Nation Outcome Scales (HoNOS) are mandated outcome-measures in many mental-health jurisdictions. When HoNOS are used in different care settings, it is important to assess if setting specific bias exists. This article examines the consistency of HoNOS in a sample of psychiatric patients transitioned from acute inpatient care and community centres.Setting A regional mental health service with both acute and community facilities.Participants 111 psychiatric patients were transferred from inpatient care to community care from 2012 to 2014. Their HoNOS scores were extracted from a clinical database; Each inpatient-discharge assessment was followed by a community-intake assessment, with the median period between assessments being 4 days (range 0–14). Assessor experience and professional background were recorded.Primary and secondary outcome measures The difference of HoNOS at inpatient-discharge and community-intake were assessed with Pearson correlation, Cohen's κ and effect size.Results Inpatient-discharge HoNOS was on average lower than community-intake HoNOS. The average HoNOS was 8.05 at discharge (median 7, range 1–22), and 12.16 at intake (median 12, range 1–25), an average increase of 4.11 (SD 6.97). Pearson correlation between two total scores was 0.073 (95% CI −0.095 to 0.238) and Cohen's κ was 0.02 (95% CI −0.02 to 0.06). Differences did not appear to depend on assessor experience or professional background.Conclusions Systematic change in the HoNOS occurs at inpatient-to-community transition. Some caution should be exercised in making direct comparisons between inpatient HoNOS and community HoNOS scores.

@ARTICLE { luo_harvey_tran_phung_venkatesh_connor_bmj16consistency, AUTHOR = { Luo, Wei and Harvey, Richard and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha and Connor, Jason P }, TITLE = { Consistency of the Health of the Nation Outcome Scales ({HoNOS}) at inpatient-to-community transition }, JOURNAL = { BMJ open }, YEAR = { 2016 }, VOLUME = { 6 }, NUMBER = { 4 }, PAGES = { e010732 }, MONTH = { April }, ABSTRACT = { Objectives The Health of the Nation Outcome Scales (HoNOS) are mandated outcome-measures in many mental-health jurisdictions. When HoNOS are used in different care settings, it is important to assess if setting specific bias exists. This article examines the consistency of HoNOS in a sample of psychiatric patients transitioned from acute inpatient care and community centres.Setting A regional mental health service with both acute and community facilities.Participants 111 psychiatric patients were transferred from inpatient care to community care from 2012 to 2014. Their HoNOS scores were extracted from a clinical database; Each inpatient-discharge assessment was followed by a community-intake assessment, with the median period between assessments being 4 days (range 0–14). Assessor experience and professional background were recorded.Primary and secondary outcome measures The difference of HoNOS at inpatient-discharge and community-intake were assessed with Pearson correlation, Cohen's κ and effect size.Results Inpatient-discharge HoNOS was on average lower than community-intake HoNOS. The average HoNOS was 8.05 at discharge (median 7, range 1–22), and 12.16 at intake (median 12, range 1–25), an average increase of 4.11 (SD 6.97). Pearson correlation between two total scores was 0.073 (95% CI −0.095 to 0.238) and Cohen's κ was 0.02 (95% CI −0.02 to 0.06). Differences did not appear to depend on assessor experience or professional background.Conclusions Systematic change in the HoNOS occurs at inpatient-to-community transition. Some caution should be exercised in making direct comparisons between inpatient HoNOS and community HoNOS scores. }, DOI = { 10.1136/bmjopen-2015-010732 }, FILE = { :luo_harvey_tran_phung_venkatesh_connor_bmj16consistency - Consistency of the Health of the Nation Outcome Scales (HoNOS) at Inpatient to Community Transition.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, PUBLISHER = { British Medical Journal Publishing Group }, TIMESTAMP = { 2016.05.10 }, URL = { http://bmjopen.bmj.com/content/6/4/e010732.full }, }

A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression
Saha, Budhaditya, Nguyen, Thin, Phung, Dinh and Venkatesh, Svetha. IEEE Journal of Biomedical and Health Informatics (JBHI), PP(99):1-1, March 2016. [ | | pdf]
Mental illness has a deep impact on individuals, families, and by extension, society as a whole. Social networks allow individuals with mental disorders to communicate with others sufferers via online communities, providing an invaluable resource for studies on textual signs of psychological health problems. Mental disorders often occur in combinations, e.g., a patient with an anxiety disorder may also develop depression. This co-occurring mental health condition provides the focus for our work on classifying online communities with an interest in depression. For this, we have crawled a large body of 620,000 posts made by 80,000 users in 247 online communities. We have extracted the topics and psycho-linguistic features expressed in the posts, using these as inputs to our model. Following a machine learning technique, we have formulated a joint modelling framework in order to classify mental health-related co-occurring online communities from these features. Finally, we performed empirical validation of the model on the crawled dataset where our model outperforms recent state-of-the-art baselines.

@ARTICLE { budhaditya_nguyen_phung_venkatesh_bhi16framework, AUTHOR = { Saha, Budhaditya and Nguyen, Thin and Phung, Dinh and Venkatesh, Svetha }, TITLE = { A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression }, JOURNAL = { IEEE Journal of Biomedical and Health Informatics (JBHI) }, YEAR = { 2016 }, VOLUME = { PP }, NUMBER = { 99 }, PAGES = { 1-1 }, MONTH = { March }, ISSN = { 2168-2194 }, ABSTRACT = { Mental illness has a deep impact on individuals, families, and by extension, society as a whole. Social networks allow individuals with mental disorders to communicate with others sufferers via online communities, providing an invaluable resource for studies on textual signs of psychological health problems. Mental disorders often occur in combinations, e.g., a patient with an anxiety disorder may also develop depression. This co-occurring mental health condition provides the focus for our work on classifying online communities with an interest in depression. For this, we have crawled a large body of 620,000 posts made by 80,000 users in 247 online communities. We have extracted the topics and psycho-linguistic features expressed in the posts, using these as inputs to our model. Following a machine learning technique, we have formulated a joint modelling framework in order to classify mental health-related co-occurring online communities from these features. Finally, we performed empirical validation of the model on the crawled dataset where our model outperforms recent state-of-the-art baselines. }, DOI = { 10.1109/JBHI.2016.2543741 }, FILE = { :budhaditya_nguyen_phung_venkatesh_bhi16framework - A Framework for Classifying Online Mental Health Related Communities with an Interest in Depression.pdf:PDF }, KEYWORDS = { Blogs;Correlation;Covariance matrices;Feature extraction;Informatics;Media;Pragmatics }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7436759&tag=1 }, }

A new transfer learning framework with application to model-agnostic multi-task learning
Gupta, Sunil, Rana, Santu, Saha, Budhaditya, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), February 2016. [ | | pdf]
Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner's choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines.

@ARTICLE { gupta_rana_budhaditya_phung_venkatesh_kais16newtransfer, AUTHOR = { Gupta, Sunil and Rana, Santu and Saha, Budhaditya and Phung, Dinh and Venkatesh, Svetha }, TITLE = { A new transfer learning framework with application to model-agnostic multi-task learning }, JOURNAL = { Knowledge and Information Systems (KAIS) }, YEAR = { 2016 }, PAGES = { 1--41 }, MONTH = { February }, ISSN = { 0219-3116 }, ABSTRACT = { Learning from small number of examples is a challenging problem in machine learning. An effective way to improve the performance is through exploiting knowledge from other related tasks. Multi-task learning (MTL) is one such useful paradigm that aims to improve the performance through jointly modeling multiple related tasks. Although there exist numerous classification or regression models in machine learning literature, most of the MTL models are built around ridge or logistic regression. There exist some limited works, which propose multi-task extension of techniques such as support vector machine, Gaussian processes. However, all these MTL models are tied to specific classification or regression algorithms and there is no single MTL algorithm that can be used at a meta level for any given learning algorithm. Addressing this problem, we propose a generic, model-agnostic joint modeling framework that can take any classification or regression algorithm of a practitioner's choice (standard or custom-built) and build its MTL variant. The key observation that drives our framework is that due to small number of examples, the estimates of task parameters are usually poor, and we show that this leads to an under-estimation of task relatedness between any two tasks with high probability. We derive an algorithm that brings the tasks closer to their true relatedness by improving the estimates of task parameters. This is achieved by appropriate sharing of data across tasks. We provide the detail theoretical underpinning of the algorithm. Through our experiments with both synthetic and real datasets, we demonstrate that the multi-task variants of several classifiers/regressors (logistic regression, support vector machine, K-nearest neighbor, Random Forest, ridge regression, support vector regression) convincingly outperform their single-task counterparts. We also show that the proposed model performs comparable or better than many state-of-the-art MTL and transfer learning baselines. }, DOI = { 10.1007/s10115-016-0926-z }, FILE = { :gupta_rana_budhaditya_phung_venkatesh_kais16newtransfer - A New Transfer Learning Framework with Application to Model Agnostic Multi Task Learning.pdf:PDF }, KEYWORDS = { Multi-task learning Model-agnostic framework Meta algorithm Classification Regression }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.10 }, URL = { http://dx.doi.org/10.1007/s10115-016-0926-z }, }

Multiple Task Transfer Learning with Small Sample Sizes
Saha, Budhaditya, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. Knowledge and Information System (KAIS), 46(2):315-342, Feb. 2016. [ | | pdf]
Prognosis, such as predicting mortality, is common in medicine. Whenconfronted with small numbers of samples, as in rare medical conditions,the task is challenging. We propose a framework for classificationwith data with small numbers of samples. Conceptually our solutionis a hybrid of multi-task and transfer learning, employing data samplesfrom source tasks as in transfer learning, but considering all taskstogether as in multi-tasklearning. Each task is modelled jointly with other related tasks bydirectly augmenting the data from other tasks. The degree of augmentationdepends on the task relatedness and is estimated directly from thedata. We apply the model on three diverse real-world datasets (healthcaredata, handwritten digit data and face data) and show that our methodoutperforms several state-of-the-art multi-task learning baselines.We extend the model for online multi-task learning where the modelparameters are incrementally updated given new data or new tasks.The novelty of our method lies in offering a hybrid multi-task/transferlearning model to exploit sharing across tasks at the data-leveland joint parameter learning.

@ARTICLE { budhaditya_gupta_venkatesh_phung_kais16multiple, AUTHOR = { Saha, Budhaditya and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Multiple Task Transfer Learning with Small Sample Sizes }, JOURNAL = { Knowledge and Information System (KAIS) }, YEAR = { 2016 }, VOLUME = { 46 }, NUMBER = { 2 }, PAGES = { 315--342 }, MONTH = { Feb. }, ABSTRACT = { Prognosis, such as predicting mortality, is common in medicine. Whenconfronted with small numbers of samples, as in rare medical conditions,the task is challenging. We propose a framework for classificationwith data with small numbers of samples. Conceptually our solutionis a hybrid of multi-task and transfer learning, employing data samplesfrom source tasks as in transfer learning, but considering all taskstogether as in multi-tasklearning. Each task is modelled jointly with other related tasks bydirectly augmenting the data from other tasks. The degree of augmentationdepends on the task relatedness and is estimated directly from thedata. We apply the model on three diverse real-world datasets (healthcaredata, handwritten digit data and face data) and show that our methodoutperforms several state-of-the-art multi-task learning baselines.We extend the model for online multi-task learning where the modelparameters are incrementally updated given new data or new tasks.The novelty of our method lies in offering a hybrid multi-task/transferlearning model to exploit sharing across tasks at the data-leveland joint parameter learning. }, DOI = { 10.1007/s10115-015-0821-z }, FILE = { :budhaditya_gupta_venkatesh_phung_kais16multiple - Multiple Task Transfer Learning with Small Sample Sizes.pdf:PDF }, KEYWORDS = { Multi-task Transfer learning Optimization Healthcare Data mining Statistical analysis }, OWNER = { dinh }, TIMESTAMP = { 2015.06.10 }, URL = { http://link.springer.com/article/10.1007/s10115-015-0821-z }, }

Stabilizing L1-norm Prediction Models by Supervised Feature Grouping
Kamkar, Iman, Gupta, Sunil Kumar, Phung, Dinh and Venkatesh, Svetha. Journal of Biomedical Informatics (JBI), 59(C):149 -168, Feb. 2016. [ | | pdf]
Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l 1 -norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making.

@ARTICLE { kamkar_gupta_phung_venkatesh_16stabilizing, AUTHOR = { Kamkar, Iman and Gupta, Sunil Kumar and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Stabilizing L1-norm Prediction Models by Supervised Feature Grouping }, JOURNAL = { Journal of Biomedical Informatics (JBI) }, YEAR = { 2016 }, VOLUME = { 59 }, NUMBER = { C }, PAGES = { 149 --168 }, MONTH = { Feb. }, ISSN = { 1532-0464 }, ABSTRACT = { Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l 1 -norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making. }, DOI = { http://dx.doi.org/10.1016/j.jbi.2015.11.012 }, FILE = { :kamkar_gupta_phung_venkatesh_16stabilizing - Stabilizing L1 Norm Prediction Models by Supervised Feature Grouping.pdf:PDF }, KEYWORDS = { Feature selection, Lasso, Stability, Supervised feature grouping }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://www.sciencedirect.com/science/article/pii/S1532046415002804 }, }

Graph-induced restricted Boltzmann machines for document modeling
Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Information Sciences, 328(C):60-75, Jan. 2016. [ | | pdf]
Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation – the restricted Boltzmann machine (RBM) – where the underlying graphical model is an undirected bipartite graph. Inference is efficient – document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy.

@ARTICLE { nguyen_tran_phung_venkatesh_jis16graph, AUTHOR = { Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Graph-induced restricted {B}oltzmann machines for document modeling }, JOURNAL = { Information Sciences }, YEAR = { 2016 }, VOLUME = { 328 }, NUMBER = { C }, PAGES = { 60--75 }, MONTH = { Jan. }, ABSTRACT = { Discovering knowledge from unstructured texts is a central theme in data mining and machine learning. We focus on fast discovery of thematic structures from a corpus. Our approach is based on a versatile probabilistic formulation – the restricted Boltzmann machine (RBM) – where the underlying graphical model is an undirected bipartite graph. Inference is efficient – document representation can be computed with a single matrix projection, making RBMs suitable for massive text corpora available today. Standard RBMs, however, operate on bag-of-words assumption, ignoring the inherent underlying relational structures among words. This results in less coherent word thematic grouping. We introduce graph-based regularization schemes that exploit the linguistic structures, which in turn can be constructed from either corpus statistics or domain knowledge. We demonstrate that the proposed technique improves the group coherence, facilitates visualization, provides means for estimation of intrinsic dimensionality, reduces overfitting, and possibly leads to better classification accuracy. }, DOI = { doi:10.1016/j.ins.2015.08.023 }, FILE = { :nguyen_tran_phung_venkatesh_jis16graph - Graph Induced Restricted Boltzmann Machines for Document Modeling.pdf:PDF }, KEYWORDS = { Document modeling, Feature group discovery, Restricted Boltzmann machine, Topic coherence, Word graphs }, OWNER = { dinh }, PUBLISHER = { Elsevier }, TIMESTAMP = { 2015.09.16 }, URL = { http://dx.doi.org/10.1016/j.ins.2015.08.023 }, }

2015

Differentiating sub-groups of online depression-related communities using textual cues
Nguyen, Thin, O'Dea, Bridianne, Larsen, Mark, Phung, Dinh, Venkatesh, Svetha and Christensen, Helen. In Intl. Conf. on Web Information Systems Engineering (WISE), pages 216-224, Dec. 2015. [ | | pdf]
Depression is a highly prevalent mental illness and is a comorbidity of other mental and behavioural disorders. The Internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these online conversations and the language styles of those interested in depression have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A random sample of 5,000 blog posts was crawled. Five groupings were identified: depression, bipolar, self-harm, grief, and suicide. Independent variables included psycholinguistic processes and content topics extracted from the posts. Machine learning techniques were used to discriminate messages posted in the depression sub-group from the others. Good predictive validity in depression classification using topics and psycholinguistic clues as features was found. Clear discrimination between writing styles and content, with good predictive power is an important step in understanding social media and its use in mental health.

@INPROCEEDINGS { nguyen_odea_larsen_phung_venkatesh_christensen_wise15differentiating, AUTHOR = { Nguyen, Thin and O'Dea, Bridianne and Larsen, Mark and Phung, Dinh and Venkatesh, Svetha and Christensen, Helen }, TITLE = { Differentiating sub-groups of online depression-related communities using textual cues }, BOOKTITLE = { Intl. Conf. on Web Information Systems Engineering (WISE) }, YEAR = { 2015 }, VOLUME = { 9419 }, SERIES = { Lecture Notes in Computer Science }, PAGES = { 216--224 }, MONTH = { Dec. }, PUBLISHER = { Springer }, ABSTRACT = { Depression is a highly prevalent mental illness and is a comorbidity of other mental and behavioural disorders. The Internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these online conversations and the language styles of those interested in depression have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A random sample of 5,000 blog posts was crawled. Five groupings were identified: depression, bipolar, self-harm, grief, and suicide. Independent variables included psycholinguistic processes and content topics extracted from the posts. Machine learning techniques were used to discriminate messages posted in the depression sub-group from the others. Good predictive validity in depression classification using topics and psycholinguistic clues as features was found. Clear discrimination between writing styles and content, with good predictive power is an important step in understanding social media and its use in mental health. }, DOI = { 10.1007/978-3-319-26187-4_17 }, FILE = { :nguyen_odea_larsen_phung_venkatesh_christensen_wise15differentiating - Differentiating Sub Groups of Online Depression Related Communities Using Textual Cues.pdf:PDF }, ISBN = { 978-3-319-11748-5 }, KEYWORDS = { Web community; Feature extraction; Textual cues; Online depression }, LANGUAGE = { English }, OWNER = { thinng }, TIMESTAMP = { 2015.09.16 }, URL = { http://link.springer.com/chapter/10.1007/978-3-319-26187-4_17 }, }

Using Twitter to learn about the autism community
Beykikhoshk, Adham, Arandjelovi{\'c}, Ognjen, Phung, Dinh, Venkatesh, Svetha and Caelli, Terry. IEEE/ACM Intl. Conf. on Advances in Social Network Analysis and Mining (ASONAM), 5(1):1-17, December 2015. [ | | pdf]
Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD---their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.

@ARTICLE { beykikhoshk_arandjelovic_phung_venkatesh_caelli_snaam15using, AUTHOR = { Beykikhoshk, Adham and Arandjelovi{\'c}, Ognjen and Phung, Dinh and Venkatesh, Svetha and Caelli, Terry }, TITLE = { Using {T}witter to learn about the autism community }, JOURNAL = { IEEE/ACM Intl. Conf. on Advances in Social Network Analysis and Mining (ASONAM) }, YEAR = { 2015 }, VOLUME = { 5 }, NUMBER = { 1 }, PAGES = { 1--17 }, MONTH = { December }, ABSTRACT = { Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD---their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work. }, DOI = { 10.1007/s13278-015-0261-5 }, FILE = { :beykikhoshk_arandjelovic_phung_venkatesh_caelli_snaam15using - Using Twitter to Learn about the Autism Community.pdf:PDF }, KEYWORDS = { Social media Big data Asperger’s Mental health Health care Public health ASD }, OWNER = { dinh }, PUBLISHER = { Springer Vienna }, TIMESTAMP = { 2015.06.10 }, URL = { http://dx.doi.org/10.1007/s13278-015-0261-5 }, }

Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted Boltzmann Machines
Vellanki, Pratibha, Phung, Dinh, Duong, Thi and Venkatesh, Svetha. In Trends and Applications in Knowledge Discovery and Data Mining, pages 245-257, Cham, Nov. 2015. [ | | pdf]

@INPROCEEDINGS { vellanki_phung_duong_venkatesh_pakdd2015learning, AUTHOR = { Vellanki, Pratibha and Phung, Dinh and Duong, Thi and Venkatesh, Svetha }, TITLE = { Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted {B}oltzmann Machines }, BOOKTITLE = { Trends and Applications in Knowledge Discovery and Data Mining }, YEAR = { 2015 }, VOLUME = { 9441 }, SERIES = { Lecture Notes in Computer Science }, PAGES = { 245--257 }, ADDRESS = { Cham }, MONTH = { Nov. }, PUBLISHER = { Springer }, DOI = { 10.1007/978-3-319-25660-3_21 }, FILE = { :vellanki_phung_duong_venkatesh_pakdd2015learning - Learning Entry Profiles of Children with Autism from Multivariate Treatment Information Using Restricted Boltzmann Machines.pdf:PDF }, ISBN = { 978-3-319-25660-3 }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.21 }, URL = { http://dx.doi.org/10.1007/978-3-319-25660-3_21 }, }

Multi-View Subspace Clustering for Face Images
Zhang, Xin, Phung, Dinh, Venkatesh, Svetha, Pham, Duc-Son and Liu, Wanquan. In Intl. Conf. on Digital Image Computing: Techniques and Applications (DICTA), pages 1-7, Nov. 2015. [ | | pdf]
In many real-world computer vision applications, such as multi-camera surveillance, the objects of interest are captured by visual sensors concurrently, resulting in multi-view data. These views usually provide complementary information to each other. One recent and powerful computer vision method for clustering is sparse subspace clustering (SSC); however, it was not designed for multi-view data, which break down its linear separability assumption. To integrate complementary information between views, multi-view clustering algorithms are required to improve the clustering performance. In this paper, we propose a novel multi-view subspace clustering by searching for an unified latent structure as a global affinity matrix in subspace clustering. Due to the integration of affinity matrices for each view, this global affinity matrix can best represent the relationship between clusters. This could help us achieve better performance on face clustering. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other alternatives based on state-of-the-arts on challenging multi-view face datasets.

@INPROCEEDINGS { zhang_phung_venkatesh_pham_liu_dicta15multiview, AUTHOR = { Zhang, Xin and Phung, Dinh and Venkatesh, Svetha and Pham, Duc-Son and Liu, Wanquan }, TITLE = { Multi-View Subspace Clustering for Face Images }, BOOKTITLE = { Intl. Conf. on Digital Image Computing: Techniques and Applications (DICTA) }, YEAR = { 2015 }, PAGES = { 1-7 }, MONTH = { Nov. }, ABSTRACT = { In many real-world computer vision applications, such as multi-camera surveillance, the objects of interest are captured by visual sensors concurrently, resulting in multi-view data. These views usually provide complementary information to each other. One recent and powerful computer vision method for clustering is sparse subspace clustering (SSC); however, it was not designed for multi-view data, which break down its linear separability assumption. To integrate complementary information between views, multi-view clustering algorithms are required to improve the clustering performance. In this paper, we propose a novel multi-view subspace clustering by searching for an unified latent structure as a global affinity matrix in subspace clustering. Due to the integration of affinity matrices for each view, this global affinity matrix can best represent the relationship between clusters. This could help us achieve better performance on face clustering. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other alternatives based on state-of-the-arts on challenging multi-view face datasets. }, DOI = { 10.1109/DICTA.2015.7371289 }, FILE = { :zhang_phung_venkatesh_pham_liu_dicta15multiview - Multi View Subspace Clustering for Face Images.pdf:PDF }, KEYWORDS = { computer vision;face recognition;pattern clustering;ADMM framework;SSC;affinity matrices;alternating direction method;computer vision applications;computer vision method;convergent algorithm;face clustering;face images;global affinity matrix;latent structure;linear separability assumption;multicamera surveillance;multipliers;multiview data;multiview face datasets;multiview subspace clustering algorithms;sparse subspace clustering performance;visual sensors;Cameras;Clustering algorithms;Computer vision;Face;Loss measurement;Matrix decomposition;Sparse matrices }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.21 }, URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7371289 }, }

Streaming Variational Inference for Dirichlet Process Mixtures
Huynh, V., Phung, D. and Venkatesh, S.. In 7th Asian Conference on Machine Learning (ACML), pages 237-252, Nov. 2015. [ | | pdf]
Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data.

@INPROCEEDINGS { huynh_phung_venkatesh_15streaming, AUTHOR = { Huynh, V. and Phung, D. and Venkatesh, S. }, TITLE = { Streaming Variational Inference for {D}irichlet {P}rocess {M}ixtures }, BOOKTITLE = { 7th Asian Conference on Machine Learning (ACML) }, YEAR = { 2015 }, PAGES = { 237--252 }, MONTH = { Nov. }, ABSTRACT = { Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data. }, FILE = { :huynh_phung_venkatesh_15streaming - Streaming Variational Inference for Dirichlet Process Mixtures.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.06 }, URL = { http://www.jmlr.org/proceedings/papers/v45/Huynh15.pdf }, }

Understanding toxicities and complications of cancer treatment: A data mining approach
Nguyen, Dang, Luo, Wei, Phung, Dinh and Venkatesh, Svetha. In 28th Australasian Joint Conference on Artificial Intelligence (AI), pages 431-443, Nov 2015. [ | | pdf]

@INPROCEEDINGS { nguyen_luo_phung_venkatesh_ai15understanding, AUTHOR = { Nguyen, Dang and Luo, Wei and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Understanding toxicities and complications of cancer treatment: A data mining approach }, BOOKTITLE = { 28th Australasian Joint Conference on Artificial Intelligence (AI) }, YEAR = { 2015 }, EDITOR = { Pfahringer, Bernhard and Renz, Jochen }, VOLUME = { 9457 }, SERIES = { Lecture Notes in Computer Science }, PAGES = { 431--443 }, MONTH = { Nov }, PUBLISHER = { Springer International Publishing }, DOI = { 10.1007/978-3-319-26350-2_38 }, FILE = { :nguyen_luo_phung_venkatesh_ai15understanding - Understanding Toxicities and Complications of Cancer Treatment_ a Data Mining Approach.pdf:PDF }, LOCATION = { Canberra, ACT, Australia }, OWNER = { ngdang }, TIMESTAMP = { 2015.09.15 }, URL = { http://dx.doi.org/10.1007/978-3-319-26350-2_38 }, }

Stable Feature Selection with Support Vector Machines
Kamkar, Iman, Gupta, Sunil Kumar, Phung, Dinh and Venkatesh, Svetha. In 28th Australasian Joint Conference on Artificial Intelligence (AI), pages 298-308, Cham, Nov. 2015. [ | | pdf]
The support vector machine (SVM) is a popular method for classification, well known for finding the maximum-margin hyperplane. Combining SVM with l1l1-norm penalty further enables it to simultaneously perform feature selection and margin maximization within a single framework. However, l1l1-norm SVM shows instability in selecting features in presence of correlated features. We propose a new method to increase the stability of l1l1-norm SVM by encouraging similarities between feature weights based on feature correlations, which is captured via a feature covariance matrix. Our proposed method can capture both positive and negative correlations between features. We formulate the model as a convex optimization problem and propose a solution based on alternating minimization. Using both synthetic and real-world datasets, we show that our model achieves better stability and classification accuracy compared to several state-of-the-art regularized classification methods.

@INPROCEEDINGS { kamkar_gupta_phung_venkatesh_ai15stable, AUTHOR = { Kamkar, Iman and Gupta, Sunil Kumar and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Stable Feature Selection with {S}upport {V}ector {M}achines }, BOOKTITLE = { 28th Australasian Joint Conference on Artificial Intelligence (AI) }, YEAR = { 2015 }, EDITOR = { Pfahringer, Bernhard and Renz, Jochen }, VOLUME = { 9457 }, SERIES = { Lecture Notes in Computer Science }, PAGES = { 298--308 }, ADDRESS = { Cham }, MONTH = { Nov. }, PUBLISHER = { Springer International Publishing }, ABSTRACT = { The support vector machine (SVM) is a popular method for classification, well known for finding the maximum-margin hyperplane. Combining SVM with l1l1-norm penalty further enables it to simultaneously perform feature selection and margin maximization within a single framework. However, l1l1-norm SVM shows instability in selecting features in presence of correlated features. We propose a new method to increase the stability of l1l1-norm SVM by encouraging similarities between feature weights based on feature correlations, which is captured via a feature covariance matrix. Our proposed method can capture both positive and negative correlations between features. We formulate the model as a convex optimization problem and propose a solution based on alternating minimization. Using both synthetic and real-world datasets, we show that our model achieves better stability and classification accuracy compared to several state-of-the-art regularized classification methods. }, DOI = { 10.1007/978-3-319-26350-2_26 }, FILE = { :kamkar_gupta_phung_venkatesh_ai15stable - Stable Feature Selection with Support Vector Machines.pdf:PDF }, ISBN = { 978-3-319-26350-2 }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.21 }, URL = { http://dx.doi.org/10.1007/978-3-319-26350-2_26 }, }

Exploiting Feature Relationships Towards Stable Feature Selection
Kamkar, Iman, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. In Intl. Conf. on Data Science and Advanced Analytics (DSAA), pages 1-10, Paris, France, Oct. 2015. [ | | pdf]
Feature selection is an important step in building predictive models for most real-world problems. One of the popular methods in feature selection is Lasso. However, it shows instability in selecting features when dealing with correlated features. In this work, we propose a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness, which is captured via a feature covariance matrix. Besides modeling positive feature correlations, our method can also identify negative correlations between features. We propose a convex formulation for our model along with an alternating optimization algorithm that can learn the weights of the features as well as the relationship between them. Using both synthetic and real-world data, we show that the proposed method is more stable than Lasso and many state-of-the-art shrinkage and feature selection methods. Also, its predictive performance is comparable to other methods.

@INPROCEEDINGS { kamkar_gupta_phung_venkatesh_dsaa15, AUTHOR = { Kamkar, Iman and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Exploiting Feature Relationships Towards Stable Feature Selection }, BOOKTITLE = { Intl. Conf. on Data Science and Advanced Analytics (DSAA) }, YEAR = { 2015 }, PAGES = { 1--10 }, ADDRESS = { Paris, France }, MONTH = { Oct. }, ABSTRACT = { Feature selection is an important step in building predictive models for most real-world problems. One of the popular methods in feature selection is Lasso. However, it shows instability in selecting features when dealing with correlated features. In this work, we propose a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness, which is captured via a feature covariance matrix. Besides modeling positive feature correlations, our method can also identify negative correlations between features. We propose a convex formulation for our model along with an alternating optimization algorithm that can learn the weights of the features as well as the relationship between them. Using both synthetic and real-world data, we show that the proposed method is more stable than Lasso and many state-of-the-art shrinkage and feature selection methods. Also, its predictive performance is comparable to other methods. }, DOI = { 10.1109/DSAA.2015.7344859 }, FILE = { :kamkar_gupta_phung_venkatesh_dsaa15 - Exploiting Feature Relationships Towards Stable Feature Selection.pdf:PDF }, KEYWORDS = { convex programming;covariance matrices;feature selection;Lasso stability;convex formulation;correlated feature;feature covariance matrix;feature relationship;feature selection method;negative correlation;optimization algorithm;positive feature correlation;predictive model;real-world data;shrinkage;stable feature selection;synthetic data;Correlation;Covariance matrices;Linear programming;Optimization;Predictive models;Stability criteria;Correlated features;Lasso;Prediction;Stability }, OWNER = { ikamkar }, TIMESTAMP = { 2015.09.16 }, URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7344859 }, }

Nonparametric Discovery of Online Mental Health-Related Communities
Dao, Bo, Nguyen, Thin, Venkatesh, Svetha and Phung, Dinh. In Intl. Conf. on Data Science and Advanced Analytics (DSAA), pages 1-10, Paris, France, Oct. 2015. (IEEE CIS Travel Grants Award). [ | | pdf]

@INPROCEEDINGS { dao_nguyen_venkatesh_phung_dsaa15, AUTHOR = { Dao, Bo and Nguyen, Thin and Venkatesh, Svetha and Phung, Dinh }, TITLE = { Nonparametric Discovery of Online Mental Health-Related Communities }, BOOKTITLE = { Intl. Conf. on Data Science and Advanced Analytics (DSAA) }, YEAR = { 2015 }, PAGES = { 1-10 }, ADDRESS = { Paris, France }, MONTH = { Oct. }, PUBLISHER = { IEEE }, NOTE = { IEEE CIS Travel Grants Award }, DOI = { 10.1109/DSAA.2015.7344841 }, FILE = { :dao_nguyen_venkatesh_phung_dsaa15 - Nonparametric Discovery of Online Mental Health Related Communities.pdf:PDF }, KEYWORDS = { cognition;health care;nonparametric statistics;pattern clustering;social networking (online);cognitive dynamics;mood swings patterns;nonparametric clustering;nonparametric discovery;nonparametric topic modelling;online communities;online mental health-related communities;social media;Autism;Blogs;Media;Mood;Sentiment analysis;Variable speed drives;Mental Health;Moods and Emotion;Nonparametric Discovery;Online Communities;Social Media;Topics }, OWNER = { dbdao }, TIMESTAMP = { 2015.07.23 }, URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7344841 }, }

Mixed-norm sparse representation for multi view face recognition
Zhang, Xin, Pham, Duc-Son, Venkatesh, Svetha, Liu, Wanquan and Phung, Dinh. Pattern Recognition, 48(9):2935-2946, Sep. 2015. [ | | pdf]

@ARTICLE { zhang_pham_venkatesh_liu_phung_pr15mixed, AUTHOR = { Zhang, Xin and Pham, Duc-Son and Venkatesh, Svetha and Liu, Wanquan and Phung, Dinh }, TITLE = { Mixed-norm sparse representation for multi view face recognition }, JOURNAL = { Pattern Recognition }, YEAR = { 2015 }, VOLUME = { 48 }, NUMBER = { 9 }, PAGES = { 2935--2946 }, MONTH = { Sep. }, DOI = { 10.1016/j.patcog.2015.02.022 }, FILE = { :zhang_pham_venkatesh_liu_phung_pr15mixed - Mixed Norm Sparse Representation for Multi View Face Recognition.pdf:PDF }, KEYWORDS = { ADMM, Convex optimization, Group sparse representation, Joint dynamic sparse representation classification, Multi-pose face recognition, Multi-task learning, Robust face recognition, Sparse representation classification, Unsupervised learning }, OWNER = { dinh }, PUBLISHER = { Pergamon }, TIMESTAMP = { 2015.09.16 }, URL = { http://dl.acm.org/citation.cfm?id=2792197 }, }

Overcoming Data Scarcity of Twitter: Using Tweets As Bootstrap with Application to Autism-Related Topic Content Analysis
Beykikhoshk, Adham, Arandjelovi\'{c}, Ognjen, Phung, Dinh and Venkatesh, Svetha. In IEEE/ACM Intl. Conf. on Advances in Social Networks Analysis and Mining (ASONAM), pages 1354-1361, New York, NY, USA, Aug. 2015. [ | | pdf]
Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.

@INPROCEEDINGS { beykikhoshk_arandjelovic_phung_venkatesh_asonam15overcoming, AUTHOR = { Beykikhoshk, Adham and Arandjelovi\'{c}, Ognjen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Overcoming Data Scarcity of {T}witter: Using Tweets As Bootstrap with Application to Autism-Related Topic Content Analysis }, BOOKTITLE = { IEEE/ACM Intl. Conf. on Advances in Social Networks Analysis and Mining (ASONAM) }, YEAR = { 2015 }, SERIES = { ASONAM '15 }, PAGES = { 1354--1361 }, ADDRESS = { New York, NY, USA }, MONTH = { Aug. }, PUBLISHER = { ACM }, ABSTRACT = { Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags. }, ACMID = { 2808908 }, DOI = { 10.1145/2808797.2808908 }, FILE = { :beykikhoshk_arandjelovic_phung_venkatesh_asonam15overcoming - Overcoming Data Scarcity of Twitter_ Using Tweets As Bootstrap with Application to Autism Related Topic Content Analysis.pdf:PDF }, ISBN = { 978-1-4503-3854-7 }, LOCATION = { Paris, France }, NUMPAGES = { 8 }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.05.21 }, URL = { http://doi.acm.org/10.1145/2808797.2808908 }, }

Autism Blogs: Expressed Emotion, Language Styles and Concerns in Personal and Community Settings
Nguyen, Thin, Duong, Thi, Venkatesh, Svetha and Phung, Dinh. IEEE Transactions on Affective Computing (TAC), 6(3):312-323, July 2015. [ | | pdf]
The Internet has provided an ever increasingly popular platform for individuals to voice their thoughts, and like-minded people to share stories. This unintentionally leaves characteristics of individuals and communities, which are often difficult to be collected in traditional studies. Individuals with autism are such a case, in which the Internet could facilitate even more communication given its social-spatial distance being a characteristic preference for individuals with autism. Previous studies examined the traces left in the posts of online autism communities (Autism) in comparison with other online communities (Control). This work further investigates these online populations through the contents of not only their posts but also their comments. We first compare the Autism and Control blogs based on three features: topics, language styles and affective information. The autism groups are then further examined, based on the same three features, by looking at their personal (Personal) and community (Community) blogs separately. Machine learning and statistical methods are used to discriminate blog contents in both cases. All three features are found to be significantly different between Autism and Control, and between autism Personal and Community. These features also show good indicative power in prediction of autism blogs in both personal and community settings.

@ARTICLE { nguyen_duong_venkatesh_phung_tac15, AUTHOR = { Nguyen, Thin and Duong, Thi and Venkatesh, Svetha and Phung, Dinh }, TITLE = { Autism Blogs: Expressed Emotion, Language Styles and Concerns in Personal and Community Settings }, JOURNAL = { IEEE Transactions on Affective Computing (TAC) }, YEAR = { 2015 }, VOLUME = { 6 }, NUMBER = { 3 }, PAGES = { 312-323 }, MONTH = { July }, ISSN = { 1949-3045 }, ABSTRACT = { The Internet has provided an ever increasingly popular platform for individuals to voice their thoughts, and like-minded people to share stories. This unintentionally leaves characteristics of individuals and communities, which are often difficult to be collected in traditional studies. Individuals with autism are such a case, in which the Internet could facilitate even more communication given its social-spatial distance being a characteristic preference for individuals with autism. Previous studies examined the traces left in the posts of online autism communities (Autism) in comparison with other online communities (Control). This work further investigates these online populations through the contents of not only their posts but also their comments. We first compare the Autism and Control blogs based on three features: topics, language styles and affective information. The autism groups are then further examined, based on the same three features, by looking at their personal (Personal) and community (Community) blogs separately. Machine learning and statistical methods are used to discriminate blog contents in both cases. All three features are found to be significantly different between Autism and Control, and between autism Personal and Community. These features also show good indicative power in prediction of autism blogs in both personal and community settings. }, DOI = { 10.1109/TAFFC.2015.2400912 }, FILE = { :nguyen_duong_venkatesh_phung_tac15 - Autism Blogs_ Expressed Emotion, Language Styles and Concerns in Personal and Community Settings.pdf:PDF }, KEYWORDS = { Web sites;human factors;learning (artificial intelligence);statistical analysis;Internet;affective information;autism blogs;blog content discrimination;community setting;control blogs;language styles;machine learning;online autism communities;personal setting;social-spatial distance;statistical methods;topics;Autism;Blogs;Communities;Educational institutions;Feature extraction;Sociology;Variable speed drives;Affective norms;affective norms;autism;language styles;psychological health;topics }, OWNER = { thinng }, TIMESTAMP = { 2015.01.28 }, URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7034996 }, }

Stabilized Sparse Ordinal Regression for Medical Risk Stratification
Tran, Truyen, Phung, Dinh, Luo, Wei and Venkatesh, Svetha. Knowledge and Information Systems (KAIS), 43(3):555-582, June 2015. [ | | pdf]
The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability.

@ARTICLE { tran_phung_luo_venkatesh_kais15stabilized, AUTHOR = { Tran, Truyen and Phung, Dinh and Luo, Wei and Venkatesh, Svetha }, TITLE = { Stabilized Sparse Ordinal Regression for Medical Risk Stratification }, JOURNAL = { Knowledge and Information Systems (KAIS) }, YEAR = { 2015 }, VOLUME = { 43 }, NUMBER = { 3 }, PAGES = { 555--582 }, MONTH = { June }, ABSTRACT = { The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability. }, DOI = { 10.1007/s10115-014-0740-4 }, FILE = { :Tran2015_Article_StabilizedSparseOrdinalRegress.pdf:PDF }, KEYWORDS = { Medical risk stratification Sparse ordinal regression Stability Feature graph Electronic medical record }, OWNER = { dinh }, TIMESTAMP = { 2014.01.28 }, URL = { http://link.springer.com/article/10.1007%2Fs10115-014-0740-4 }, }

A predictive framework for modeling healthcare data with evolving clinical interventions
Rana, Santu, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. The ASA Data Science Journal Statistical Analysis and Data Mining, 8(3):162-182, June 2015. [ | | pdf]
Medical interventions critically determine clinical outcomes. But prediction models either ignore interventions or dilute impact by building a single prediction rule by amalgamating interventions with other features. One rule across all interventions may not capture differential effects. Also, interventions change with time as innovations are made, requiring prediction models to evolve over time. To address these gaps, we propose a prediction framework that explicitly models interventions by extracting a set of latent intervention groups through a Hierarchical Dirichlet Process (HDP) mixture. Data are split in temporal windows and for each window, a separate distribution over the intervention groups is learnt. This ensures that the model evolves with changing interventions. The outcome is modeled as conditional, on both the latent grouping and the patients' condition, through a Bayesian logistic regression. Learning distributions for each time-window result in an over-complex model when interventions do not change in every time-window. We show that by replacing HDP with a dynamic HDP prior, a more compact set of distributions can be learnt. Experiments performed on two hospital datasets demonstrate the superiority of our framework over many existing clinical and traditional prediction frameworks.

@ARTICLE { rana_gupta_phung_venkatesh_sdm15predictive, AUTHOR = { Rana, Santu and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha }, TITLE = { A predictive framework for modeling healthcare data with evolving clinical interventions }, JOURNAL = { The ASA Data Science Journal Statistical Analysis and Data Mining }, YEAR = { 2015 }, VOLUME = { 8 }, NUMBER = { 3 }, PAGES = { 162--182 }, MONTH = { June }, ABSTRACT = { Medical interventions critically determine clinical outcomes. But prediction models either ignore interventions or dilute impact by building a single prediction rule by amalgamating interventions with other features. One rule across all interventions may not capture differential effects. Also, interventions change with time as innovations are made, requiring prediction models to evolve over time. To address these gaps, we propose a prediction framework that explicitly models interventions by extracting a set of latent intervention groups through a Hierarchical Dirichlet Process (HDP) mixture. Data are split in temporal windows and for each window, a separate distribution over the intervention groups is learnt. This ensures that the model evolves with changing interventions. The outcome is modeled as conditional, on both the latent grouping and the patients' condition, through a Bayesian logistic regression. Learning distributions for each time-window result in an over-complex model when interventions do not change in every time-window. We show that by replacing HDP with a dynamic HDP prior, a more compact set of distributions can be learnt. Experiments performed on two hospital datasets demonstrate the superiority of our framework over many existing clinical and traditional prediction frameworks. }, DOI = { 10.1002/sam.11262 }, FILE = { :rana_gupta_phung_venkatesh_sdm15predictive - A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions.pdf:PDF }, KEYWORDS = { data mining, machine learning, healthcare data modeling }, OWNER = { dinh }, PUBLISHER = { Wiley Subscription Services, Inc., A Wiley Company }, TIMESTAMP = { 2015.06.10 }, URL = { http://dx.doi.org/10.1002/sam.11262 }, }

Stabilizing High-Dimensional Prediction Models Using Feature Graphs
Gopakumar, Shivapratap, Tran, Truyen, Nguyen, Tu, Phung, Dinh and Venkatesh, Svetha. IEEE Journal of Biomedical and Health Informatics (JBHI), 19(3):1044-1052, May 2015. [ | | pdf]
We investigate feature stability in the context of clinical prognosis derived from high-dimensional electronic medical records. To reduce variance in the selected features that are predictive, we introduce Laplacian-based regularization into a regression model. The Laplacian is derived on a feature graph that captures both the temporal and hierarchic relations between hospital events, diseases, and interventions. Using a cohort of patients with heart failure, we demonstrate better feature stability and goodness-of-fit through feature graph stabilization.

@ARTICLE { gopakumar_tran_nguyen_phung_venkatesh_bhi15stabilizing, AUTHOR = { Gopakumar, Shivapratap and Tran, Truyen and Nguyen, Tu and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Stabilizing High-Dimensional Prediction Models Using Feature Graphs }, JOURNAL = { IEEE Journal of Biomedical and Health Informatics (JBHI) }, YEAR = { 2015 }, VOLUME = { 19 }, NUMBER = { 3 }, PAGES = { 1044--1052 }, MONTH = { May }, ISSN = { 2168-2194 }, ABSTRACT = { We investigate feature stability in the context of clinical prognosis derived from high-dimensional electronic medical records. To reduce variance in the selected features that are predictive, we introduce Laplacian-based regularization into a regression model. The Laplacian is derived on a feature graph that captures both the temporal and hierarchic relations between hospital events, diseases, and interventions. Using a cohort of patients with heart failure, we demonstrate better feature stability and goodness-of-fit through feature graph stabilization. }, DOI = { 10.1109/JBHI.2014.2353031 }, FILE = { :gopakumar_tran_nguyen_phung_venkatesh_bhi15stabilizing - Stabilizing High Dimensional Prediction Models Using Feature Graphs.pdf:PDF }, KEYWORDS = { Laplace equations;cardiology;diseases;electronic health records;feature selection;graphs;medical diagnostic computing;regression analysis;Laplacian-based regularization;clinical prognosis;diseases;feature graph stabilization;goodness-of-fit;heart failure;hierarchic relations;high-dimensional electronic medical records;hospital events;interventions;regression model;selected features;stabilizing high-dimensional prediction models;temporal relations;Data models;Feature extraction;Heart;Indexes;Predictive models;Stability criteria;Biomedical computing;electronic medical records;predictive models;stability }, OWNER = { thinng }, TIMESTAMP = { 2015.01.29 }, URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6887285 }, }

A Bayesian Nonparametric Approach to Multilevel Regression
Nguyen, V., Phung, D., Venkatesh, S. and Bui, H.H.. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 330-342, May 2015. [ | | pdf]
Regression is at the cornerstone of statistical analysis. Multilevel regression, on the other hand, receives little research attention, though it is prevalent in economics, biostatistics and healthcare to name a few. We present a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups. Furthermore, our approach exploits additional group-specific context observations, we use Dirichlet Process with product-space base measure in a nested structure to model group-level context distribution and the regression distribution to accommodate the multilevel structure of the data. The proposed model simultaneously partitions groups into cluster and perform regression. We provide collapsed Gibbs sampler for posterior inference. We perform extensive experiments on econometric panel data and healthcare longitudinal data to demonstrate the effectiveness of the proposed model.

@INPROCEEDINGS { nguyen_phung_venkatesh_bui_pakdd15, AUTHOR = { Nguyen, V. and Phung, D. and Venkatesh, S. and Bui, H.H. }, TITLE = { A {B}ayesian Nonparametric Approach to Multilevel Regression }, BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2015 }, PAGES = { 330--342 }, MONTH = { May }, ABSTRACT = { Regression is at the cornerstone of statistical analysis. Multilevel regression, on the other hand, receives little research attention, though it is prevalent in economics, biostatistics and healthcare to name a few. We present a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups. Furthermore, our approach exploits additional group-specific context observations, we use Dirichlet Process with product-space base measure in a nested structure to model group-level context distribution and the regression distribution to accommodate the multilevel structure of the data. The proposed model simultaneously partitions groups into cluster and perform regression. We provide collapsed Gibbs sampler for posterior inference. We perform extensive experiments on econometric panel data and healthcare longitudinal data to demonstrate the effectiveness of the proposed model. }, DOI = { 10.1007/978-3-319-18038-0_26 }, FILE = { :nguyen_phung_venkatesh_bui_pakdd15 - A Bayesian Nonparametric Approach to Multilevel Regression.pdf:PDF }, OWNER = { Dinh }, TIMESTAMP = { 2015.02.08 }, URL = { http://link.springer.com/chapter/10.1007%2F978-3-319-18038-0_26 }, }

Hierarchical Dirichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature
Beykikhoshk, Adham, Arandjelovi{\'{c}}, Ognjen, Venkatesh, Svetha and Phung, Dinh. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 550-562, Ho Chi Minh City, Vietnam, May 2015. [ | | pdf]
In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) – an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we made freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature.

@INPROCEEDINGS { beykikhoshk_arandjelovic_venkatesh_phung_pakdd15, AUTHOR = { Beykikhoshk, Adham and Arandjelovi{\'{c}}, Ognjen and Venkatesh, Svetha and Phung, Dinh }, TITLE = { Hierarchical {D}irichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature }, BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2015 }, EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi }, VOLUME = { 9077 }, SERIES = { Lecture Notes in Computer Science }, PAGES = { 550--562 }, ADDRESS = { Ho Chi Minh City, Vietnam }, MONTH = { May }, PUBLISHER = { Springer International Publishing }, ABSTRACT = { In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) – an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we made freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature. }, DOI = { 10.1007/978-3-319-18038-0_43 }, FILE = { :beykikhoshk_arandjelovic_venkatesh_phung_pakdd15 - Hierarchical Dirichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature.pdf:PDF }, OWNER = { Dinh }, TIMESTAMP = { 2015.02.08 }, URL = { http://dx.doi.org/10.1007/978-3-319-18038-0_43 }, }

Stabilizing Sparse Cox Model using Statistic and Semantic Structures in Electronic Medical Records
Gopakumar, Shivapratap, Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 331-343, Ho Chi Minh City, Vietnam, May 2015. (Runner-up Best Student Paper Award). [ | | pdf]
Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data, which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using statistical and semantic structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using three feature graphs built from (i) Jaccard similarity among features (ii) aggregation of Jaccard similarity graph and a recently introduced semantic EMR graph (iii) Jaccard similarity among features transferred from a related cohort. Our experiments are conducted on two real world hospital datasets: a heart failure cohort and a diabetes cohort. On two stability measures – the Consistency index and signal-to-noise ratio (SNR) – the use of our proposed methods significantly increased feature stability when compared with the baselines.

@INPROCEEDINGS { gopakumar_nguyen_tran_phung_venkatesh_pakdd15, AUTHOR = { Gopakumar, Shivapratap and Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Stabilizing Sparse {C}ox Model using Statistic and Semantic Structures in Electronic Medical Records }, BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2015 }, EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi }, VOLUME = { 9078 }, SERIES = { Lecture Notes in Computer Science }, PAGES = { 331--343 }, ADDRESS = { Ho Chi Minh City, Vietnam }, MONTH = { May }, PUBLISHER = { Springer International Publishing }, NOTE = { Runner-up Best Student Paper Award }, ABSTRACT = { Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in high dimensional data, which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using statistical and semantic structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using three feature graphs built from (i) Jaccard similarity among features (ii) aggregation of Jaccard similarity graph and a recently introduced semantic EMR graph (iii) Jaccard similarity among features transferred from a related cohort. Our experiments are conducted on two real world hospital datasets: a heart failure cohort and a diabetes cohort. On two stability measures – the Consistency index and signal-to-noise ratio (SNR) – the use of our proposed methods significantly increased feature stability when compared with the baselines. }, DOI = { 10.1007/978-3-319-18032-8_26 }, FILE = { :gopakumar_nguyen_tran_phung_venkatesh_pakdd15 - Stabilizing Sparse Cox Model Using Statistic and Semantic Structures in Electronic Medical Records.pdf:PDF }, OWNER = { Dinh }, TIMESTAMP = { 2015.02.08 }, URL = { http://link.springer.com/chapter/10.1007%2F978-3-319-18032-8_26 }, }

Collaborating Differently on Different Topics: A Multi-Relational Approach to Multi-Task Learning
Gupta, Sunil Kumar, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 303-316, Ho Chi Minh City, Vietnam, May 2015. (Best Paper Award). [ | | pdf]
Multi-task learning offers a way to benefit from synergy of multiple related prediction tasks via their joint modeling. Current multi-task techniques model related tasks jointly, assuming that the tasks share the same relationship across features uniformly. This assumption is seldom true as tasks may be related across some features but not others. Addressing this problem, we propose a new multi-task learning model that learns separate task relationships along different features. This added flexibility allows our model to have a finer and differential level of control in joint modeling of tasks along different features. We formulate the model as an optimization problem and provide an efficient, iterative solution. We illustrate the behavior of the proposed model using a synthetic dataset where we induce varied feature-dependent task relationships: positive relationship, negative relationship, no relationship. Using four real datasets, we evaluate the effectiveness of the proposed model for many multi-task regression and classification problems, and demonstrate its superiority over other state-of-the-art multi-task learning models.

@INPROCEEDINGS { gupta_rana_phung_venkatesh_pakdd15, AUTHOR = { Gupta, Sunil Kumar and Rana, Santu and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Collaborating Differently on Different Topics: A Multi-Relational Approach to Multi-Task Learning }, BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2015 }, EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi }, VOLUME = { 9077 }, PAGES = { 303--316 }, ADDRESS = { Ho Chi Minh City, Vietnam }, MONTH = { May }, PUBLISHER = { Springer International Publishing }, NOTE = { Best Paper Award }, ABSTRACT = { Multi-task learning offers a way to benefit from synergy of multiple related prediction tasks via their joint modeling. Current multi-task techniques model related tasks jointly, assuming that the tasks share the same relationship across features uniformly. This assumption is seldom true as tasks may be related across some features but not others. Addressing this problem, we propose a new multi-task learning model that learns separate task relationships along different features. This added flexibility allows our model to have a finer and differential level of control in joint modeling of tasks along different features. We formulate the model as an optimization problem and provide an efficient, iterative solution. We illustrate the behavior of the proposed model using a synthetic dataset where we induce varied feature-dependent task relationships: positive relationship, negative relationship, no relationship. Using four real datasets, we evaluate the effectiveness of the proposed model for many multi-task regression and classification problems, and demonstrate its superiority over other state-of-the-art multi-task learning models. }, DOI = { 10.1007/978-3-319-18038-0_24 }, FILE = { :gupta_rana_phung_venkatesh_pakdd15 - Collaborating Differently on Different Topics_ a Multi Relational Approach to Multi Task Learning.pdf:PDF }, OWNER = { Dinh }, TIMESTAMP = { 2015.02.08 }, URL = { http://link.springer.com/chapter/10.1007/978-3-319-18038-0_24 }, }

Learning Conditional Latent Structures from Multiple Data Sources
Huynh, V., Phung, D., Nguyen, X.L., Venkatesh, S. and Bui, H.H.. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 343-354, May 2015. [ | | pdf]
Data usually present in heterogeneous sources. When dealing with multiple data sources, existing models often treat them independently and thus can not explicitly model the correlation structures among data sources. To address this problem, we propose a full Bayesian nonparametric approach to model correlation structures among multiple and heterogeneous datasets. The proposed framework, first, induces mixture distribution over primary data source using hierarchical Dirichlet processes (HDP). Once conditioned on each atom (group) discovered in previous step, context data sources are mutually independent and each is generated from hierarchical Dirichlet processes. In each specific application, which covariates constitute content or context(s) is determined by the nature of data. We also derive the efficient inference and exploit the conditional independence structure to propose (conditional) parallel Gibbs sampling scheme. We demonstrate our model to address the problem of latent activities discovery in pervasive computing using mobile data. We show the advantage of utilizing multiple data sources in terms of exploratory analysis as well as quantitative clustering performance.

@INPROCEEDINGS { huynh_phung_nguyen_venkatesh_bui_pakdd15, AUTHOR = { Huynh, V. and Phung, D. and Nguyen, X.L. and Venkatesh, S. and Bui, H.H. }, TITLE = { Learning Conditional Latent Structures from Multiple Data Sources }, BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2015 }, PAGES = { 343--354 }, MONTH = { May }, ABSTRACT = { Data usually present in heterogeneous sources. When dealing with multiple data sources, existing models often treat them independently and thus can not explicitly model the correlation structures among data sources. To address this problem, we propose a full Bayesian nonparametric approach to model correlation structures among multiple and heterogeneous datasets. The proposed framework, first, induces mixture distribution over primary data source using hierarchical Dirichlet processes (HDP). Once conditioned on each atom (group) discovered in previous step, context data sources are mutually independent and each is generated from hierarchical Dirichlet processes. In each specific application, which covariates constitute content or context(s) is determined by the nature of data. We also derive the efficient inference and exploit the conditional independence structure to propose (conditional) parallel Gibbs sampling scheme. We demonstrate our model to address the problem of latent activities discovery in pervasive computing using mobile data. We show the advantage of utilizing multiple data sources in terms of exploratory analysis as well as quantitative clustering performance. }, DOI = { 10.1007/978-3-319-18038-0_27 }, FILE = { :huynh_phung_nguyen_venkatesh_bui_pakdd15 - Learning Conditional Latent Structures from Multiple Data Sources.pdf:PDF }, OWNER = { Dinh }, TIMESTAMP = { 2015.02.08 }, URL = { http://link.springer.com/chapter/10.1007/978-3-319-18038-0_27 }, }

Fast One-Class Support Vector Machine for Novelty Detection
Le, Trung, Phung, Dinh, Nguyen, Khanh and Venkatesh, Svetha. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 189-200, Ho Chi Minh City, Vietnam, May 2015. [ | | pdf]
Novelty detection arises as an important learning task in several applications. Kernel-based approach to novelty detection has been widely used due to its theoretical rigor and elegance of geometric interpretation. However, computational complexity is a major obstacle in this approach. In this paper, leveraging on the cutting-plane framework with the well-known One-Class Support Vector Machine, we present a new solution that can scale up seamlessly with data. The first solution is exact and linear when viewed through the cutting-plane; the second employed a sampling strategy that remarkably has a constant computational complexity defined relatively to the probability of approximation accuracy. Several datasets are benchmarked to demonstrate the credibility of our framework.

@INPROCEEDINGS { le_phung_nguyen_venkatesh_pakdd15, AUTHOR = { Le, Trung and Phung, Dinh and Nguyen, Khanh and Venkatesh, Svetha }, TITLE = { Fast {O}ne-{C}lass {S}upport {V}ector {M}achine for Novelty Detection }, BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2015 }, EDITOR = { Cao, Tru and Lim, Ee-Peng and Zhou, Zhi-Hua and Ho, Tu-Bao and Cheung, David and Motoda, Hiroshi }, VOLUME = { 9078 }, SERIES = { Lecture Notes in Computer Science }, PAGES = { 189--200 }, ADDRESS = { Ho Chi Minh City, Vietnam }, MONTH = { May }, PUBLISHER = { Springer International Publishing }, ABSTRACT = { Novelty detection arises as an important learning task in several applications. Kernel-based approach to novelty detection has been widely used due to its theoretical rigor and elegance of geometric interpretation. However, computational complexity is a major obstacle in this approach. In this paper, leveraging on the cutting-plane framework with the well-known One-Class Support Vector Machine, we present a new solution that can scale up seamlessly with data. The first solution is exact and linear when viewed through the cutting-plane; the second employed a sampling strategy that remarkably has a constant computational complexity defined relatively to the probability of approximation accuracy. Several datasets are benchmarked to demonstrate the credibility of our framework. }, DOI = { 10.1007/978-3-319-18032-8_15 }, FILE = { :le_phung_nguyen_venkatesh_pakdd15 - Fast One Class Support Vector Machine for Novelty Detection.pdf:PDF }, KEYWORDS = { One-class Support Vector Machine, Novelty detection, Large-scale dataset }, OWNER = { Dinh }, TIMESTAMP = { 2015.02.08 }, URL = { http://link.springer.com/chapter/10.1007/978-3-319-18032-8_15 }, }

Small-Variance Asymptotics for Bayesian Nonparametric Models with Constraints
Li, C., Rana, S., Phung, D. and Venkatesh, S.. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 92-105, May 2015. [ | | pdf]
The users often have additional knowledge when Bayesian nonparametric models (BNP) are employed, e.g. for clustering there may be prior knowledge that some of the data instances should be in the same cluster (must-link constraint) or in different clusters (cannot-link constraint), and similarly for topic modeling some words should be grouped together or separately because of an underlying semantic. This can be achieved by imposing appropriate sampling probabilities based on such constraints. However, the traditional inference technique of BNP models via Gibbs sampling is time consuming and is not scalable for large data. Variational approximations are faster but many times they do not offer good solutions. Addressing this we present a small-variance asymptotic analysis of the MAP estimates of BNP models with constraints. We derive the objective function for Dirichlet process mixture model with constraints and devise a simple and efficient K-means type algorithm. We further extend the small-variance analysis to hierarchical BNP models with constraints and devise a similar simple objective function. Experiments on synthetic and real data sets demonstrate the efficiency and effectiveness of our algorithms.

@INPROCEEDINGS { li_rana_phung_venkatesh_pakdd15, AUTHOR = { Li, C. and Rana, S. and Phung, D. and Venkatesh, S. }, TITLE = { Small-Variance Asymptotics for {B}ayesian Nonparametric Models with Constraints }, BOOKTITLE = { Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2015 }, PAGES = { 92--105 }, MONTH = { May }, ABSTRACT = { The users often have additional knowledge when Bayesian nonparametric models (BNP) are employed, e.g. for clustering there may be prior knowledge that some of the data instances should be in the same cluster (must-link constraint) or in different clusters (cannot-link constraint), and similarly for topic modeling some words should be grouped together or separately because of an underlying semantic. This can be achieved by imposing appropriate sampling probabilities based on such constraints. However, the traditional inference technique of BNP models via Gibbs sampling is time consuming and is not scalable for large data. Variational approximations are faster but many times they do not offer good solutions. Addressing this we present a small-variance asymptotic analysis of the MAP estimates of BNP models with constraints. We derive the objective function for Dirichlet process mixture model with constraints and devise a simple and efficient K-means type algorithm. We further extend the small-variance analysis to hierarchical BNP models with constraints and devise a similar simple objective function. Experiments on synthetic and real data sets demonstrate the efficiency and effectiveness of our algorithms. }, DOI = { 10.1007/978-3-319-18032-8_8 }, FILE = { :li_rana_phung_venkatesh_pakdd15 - Small Variance Asymptotics for Bayesian Nonparametric Models with Constraints.pdf:PDF }, OWNER = { Dinh }, TIMESTAMP = { 2015.02.08 }, URL = { http://link.springer.com/chapter/10.1007/978-3-319-18032-8_8 }, }

Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset
Luo, Wei, Nguyen, Thin, Nichols, Melanie, Tran, Truyen, Rana, Santu, Gupta, Sunil, Phung, Dinh, Venkatesh, Svetha and Allender, Steve. PLOS ONE, 10(5):1-13, May 2015. [ | | pdf]
For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease.

@ARTICLE { luo_nguyen_nichols_tran_rana_gupta_phung_venkatesh_allender_pone15demography, AUTHOR = { Luo, Wei and Nguyen, Thin and Nichols, Melanie and Tran, Truyen and Rana, Santu and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha and Allender, Steve }, TITLE = { Is Demography Destiny? Application of Machine Learning Techniques to Accurately Predict Population Health Outcomes from a Minimal Demographic Dataset }, JOURNAL = { PLOS ONE }, YEAR = { 2015 }, VOLUME = { 10 }, NUMBER = { 5 }, PAGES = { 1-13 }, MONTH = { May }, ABSTRACT = { For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease. }, DOI = { 10.1371/journal.pone.0125602 }, FILE = { :luo_nguyen_nichols_tran_rana_gupta_phung_venkatesh_allender_pone15demography - Is Demography Destiny.pdf:PDF }, OWNER = { dinh }, TIMESTAMP = { 2015.06.10 }, URL = { http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0125602 }, }

What shall I share and with Whom? - A Multi-Task Learning Formulation using Multi-Faceted Task Relationships
Gupta, Sunil, Rana, Santu, Phung, Dinh and Venkatesh, Svetha. In SIAM Intl. Conf. on Data Mining (SDM), pages 703-711, Vancouver, Canada, May 2015. [ | | pdf]
Multi-task learning is a learning paradigm that improves the performance of "related" tasks through their joint learning. To do this each task answers the question "Which other task should I share with"? This task relatedness can be complex - a task may be related to one set of tasks based on one subset of features and to other tasks based on other subsets. Existing multi-task learning methods do not explicitly model this reality, learning a single-faceted task relationship over all the features. This degrades performance by forcing a task to become similar to other tasks even on their unrelated features. Addressing this gap, we propose a novel multi-task learning model that learns multi-faceted task relationship, allowing tasks to collaborate differentially on different feature subsets. This is achieved by simultaneously learning a low dimensional subspace for task parameters and inducing task groups over each latent subspace basis using a novel combination of L_{1} and pairwise L_{\infty} norms. Further, our model can induce grouping across both positively and negatively related tasks, which helps towards exploiting knowledge from all types of related tasks. We validate our model on two synthetic and five real datasets, and show significant performance improvements over several state of-the-art multi-task learning techniques. Thus our model effectively answers for each task: What shall I share and with whom?

@INPROCEEDINGS { gupta_rana_phung_venkatesh_sdm15, AUTHOR = { Gupta, Sunil and Rana, Santu and Phung, Dinh and Venkatesh, Svetha }, TITLE = { What shall I share and with Whom? - A Multi-Task Learning Formulation using Multi-Faceted Task Relationships }, BOOKTITLE = { SIAM Intl. Conf. on Data Mining (SDM) }, YEAR = { 2015 }, PAGES = { 703-711 }, ADDRESS = { Vancouver, Canada }, MONTH = { May }, ABSTRACT = { Multi-task learning is a learning paradigm that improves the performance of "related" tasks through their joint learning. To do this each task answers the question "Which other task should I share with"? This task relatedness can be complex - a task may be related to one set of tasks based on one subset of features and to other tasks based on other subsets. Existing multi-task learning methods do not explicitly model this reality, learning a single-faceted task relationship over all the features. This degrades performance by forcing a task to become similar to other tasks even on their unrelated features. Addressing this gap, we propose a novel multi-task learning model that learns multi-faceted task relationship, allowing tasks to collaborate differentially on different feature subsets. This is achieved by simultaneously learning a low dimensional subspace for task parameters and inducing task groups over each latent subspace basis using a novel combination of L_{1} and pairwise L_{\infty} norms. Further, our model can induce grouping across both positively and negatively related tasks, which helps towards exploiting knowledge from all types of related tasks. We validate our model on two synthetic and five real datasets, and show significant performance improvements over several state of-the-art multi-task learning techniques. Thus our model effectively answers for each task: What shall I share and with whom? }, DOI = { 10.1137/1.9781611974010.79 }, FILE = { :gupta_rana_phung_venkatesh_sdm15 - What Shall I Share and with Whom_ a Multi Task Learning Formulation Using Multi Faceted Task Relationships.pdf:PDF }, OWNER = { thinng }, TIMESTAMP = { 2015.09.16 }, URL = { http://epubs.siam.org/doi/abs/10.1137/1.9781611974010.79 }, }

Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines
Tran, Truyen, Nguyen, Tu, Phung, Dinh and Venkatesh, Svetha. Journal of Biomedical Informatics (JBI), 54:96-105, April 2015. [ | | pdf]
Electronic medical record (EMR) offers promises for novel analytics. However, manual feature engineering from \{EMR\} is labor intensive because \{EMR\} is complex – it contains temporal, mixed-type and multimodal data packed in irregular episodes. We present a computational framework to harness \{EMR\} with minimal human supervision via restricted Boltzmann machine (RBM). The framework derives a new representation of medical objects by embedding them in a low-dimensional vector space. This new representation facilitates algebraic and statistical manipulations such as projection onto 2D plane (thereby offering intuitive visualization), object grouping (hence enabling automated phenotyping), and risk stratification. To enhance model interpretability, we introduced two constraints into model parameters: (a) nonnegative coefficients, and (b) structural smoothness. These result in a novel model called eNRBM (EMR-driven nonnegative RBM). We demonstrate the capability of the eNRBM on a cohort of 7578 mental health patients under suicide risk assessment. The derived representation not only shows clinically meaningful feature grouping but also facilitates short-term risk stratification. The F-scores, 0.21 for moderate-risk and 0.36 for high-risk, are significantly higher than those obtained by clinicians and competitive with the results obtained by support vector machines.

@ARTICLE { tran_nguyen_phung_venkatesh_bi15learning, AUTHOR = { Tran, Truyen and Nguyen, Tu and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Learning vector representation of medical objects via {EMR}-driven nonnegative restricted {B}oltzmann machines }, JOURNAL = { Journal of Biomedical Informatics (JBI) }, YEAR = { 2015 }, VOLUME = { 54 }, PAGES = { 96--105 }, MONTH = { April }, ABSTRACT = { Electronic medical record (EMR) offers promises for novel analytics. However, manual feature engineering from \{EMR\} is labor intensive because \{EMR\} is complex – it contains temporal, mixed-type and multimodal data packed in irregular episodes. We present a computational framework to harness \{EMR\} with minimal human supervision via restricted Boltzmann machine (RBM). The framework derives a new representation of medical objects by embedding them in a low-dimensional vector space. This new representation facilitates algebraic and statistical manipulations such as projection onto 2D plane (thereby offering intuitive visualization), object grouping (hence enabling automated phenotyping), and risk stratification. To enhance model interpretability, we introduced two constraints into model parameters: (a) nonnegative coefficients, and (b) structural smoothness. These result in a novel model called eNRBM (EMR-driven nonnegative RBM). We demonstrate the capability of the eNRBM on a cohort of 7578 mental health patients under suicide risk assessment. The derived representation not only shows clinically meaningful feature grouping but also facilitates short-term risk stratification. The F-scores, 0.21 for moderate-risk and 0.36 for high-risk, are significantly higher than those obtained by clinicians and competitive with the results obtained by support vector machines. }, DOI = { 10.1016/j.jbi.2015.01.012 }, FILE = { :tran_nguyen_phung_venkatesh_bi15learning - Learning Vector Representation of Medical Objects Via EMR Driven Nonnegative Restricted Boltzmann Machines.pdf:PDF }, KEYWORDS = { Electronic medical records, Vector representation, Medical objects embedding, Feature grouping, uicide risk stratification }, TIMESTAMP = { 2015.01.29 }, URL = { http://www.sciencedirect.com/science/article/pii/S1532046415000143 }, }

Topic Model Kernel Classification With Probabilistically Reduced Features
Nguyen, Vu, Phung, Dinh and Venkatesh, Svetha. Journal of Data Science (JDS), 13(2):323-340, April 2015. [ | | pdf]
Probabilistic topic models have become a standard in modern machine learning to deal with a wide range of applications. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics interpretation, but could also be informative for classification tasks. In this paper, we describe the Topic Model Kernel (TMK), a topicbased kernel for Support Vector Machine classification on data being processed by probabilistic topic models. The applicability of our proposed kernelis demonstrated in several classification tasks with real world datasets. TMK outperforms existing kernels on the distributional features and give comparative results on nonprobabilistic data types.

@ARTICLE { nguyen_phung_venkatesh_jds15, AUTHOR = { Nguyen, Vu and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Topic Model Kernel Classification With Probabilistically Reduced Features }, JOURNAL = { Journal of Data Science (JDS) }, YEAR = { 2015 }, VOLUME = { 13 }, NUMBER = { 2 }, PAGES = { 323-340 }, MONTH = { April }, ABSTRACT = { Probabilistic topic models have become a standard in modern machine learning to deal with a wide range of applications. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics interpretation, but could also be informative for classification tasks. In this paper, we describe the Topic Model Kernel (TMK), a topicbased kernel for Support Vector Machine classification on data being processed by probabilistic topic models. The applicability of our proposed kernelis demonstrated in several classification tasks with real world datasets. TMK outperforms existing kernels on the distributional features and give comparative results on nonprobabilistic data types. }, FILE = { :nguyen_phung_venkatesh_jds15 - Topic Model Kernel Classification with Probabilistically Reduced Features.pdf:PDF }, KEYWORDS = { Topic Models, Bayesian Nonparametric, Support Vector Machine, Kernel Method, Classification, Dimensionality Reduction }, OWNER = { thinng }, TIMESTAMP = { 2015.01.28 }, URL = { http://www.jds-online.com/file_download/496/6-new.pdf }, }

Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance
Nguyen, Vu, Phung, Dinh, Pham, Duc-Son and Venkatesh, Svetha. Annals of Data Science (AoDS), 2(1):21-41, March 2015. [ | | pdf]
In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events.

@ARTICLE { nguyen_phung_pham_venkatesh_aods15bayesian, AUTHOR = { Nguyen, Vu and Phung, Dinh and Pham, Duc-Son and Venkatesh, Svetha }, TITLE = { {B}ayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance }, JOURNAL = { Annals of Data Science (AoDS) }, YEAR = { 2015 }, VOLUME = { 2 }, NUMBER = { 1 }, PAGES = { 21--41 }, MONTH = { March }, ABSTRACT = { In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events. }, DOI = { 10.1007/s40745-015-0030-3 }, FILE = { :nguyen_phung_pham_venkatesh_aods15bayesian - Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance.pdf:PDF }, KEYWORDS = { Abnormal detection Bayesian nonparametric User interface Multilevel data structure Video segmentation Spatio-temporal browsing }, OWNER = { dinh }, PUBLISHER = { Springer Berlin Heidelberg }, TIMESTAMP = { 2015.06.10 }, URL = { http://link.springer.com/article/10.1007%2Fs40745-015-0030-3 }, }

Stable feature selection for clinical prediction: Exploiting \ICD\ tree structure using Tree-Lasso
Kamkar, Iman, Gupta, Sunil, Phung, Dinh and Venkatesh, Svetha. Journal of Biomedical Informatics (JBI), 53:277-290, Feb. 2015. [ | | pdf]
Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In \{EMR\} data, patients’ diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l 1 -penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis.

@ARTICLE { kamkar_gupta_phung_venkatesh_bi15, AUTHOR = { Kamkar, Iman and Gupta, Sunil and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Stable feature selection for clinical prediction: Exploiting \{ICD\} tree structure using Tree-Lasso }, JOURNAL = { Journal of Biomedical Informatics (JBI) }, YEAR = { 2015 }, VOLUME = { 53 }, PAGES = { 277--290 }, MONTH = { Feb. }, ISSN = { 1532-0464 }, ABSTRACT = { Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In \{EMR\} data, patients’ diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l 1 -penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis. }, DOI = { http://dx.doi.org/10.1016/j.jbi.2014.11.013 }, FILE = { :kamkar_gupta_phung_venkatesh_bi15 - Stable Feature Selection for Clinical Prediction_ Exploiting ICD Tree Structure Using Tree Lasso.pdf:PDF }, KEYWORDS = { Feature selection, Lasso, Tree-Lasso, Feature stability, Classification }, URL = { http://www.sciencedirect.com/science/article/pii/S1532046414002639 }, }

Tree-based Iterated Local Search for Markov Random Fields with Applications in Image Analysis
Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Journal of Heuristics, 21(1):25-45, Feb. 2015. [ | | pdf]
The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain.

@ARTICLE { tran_phung_venkatesh_jh15, AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Tree-based Iterated Local Search for {M}arkov {R}andom {F}ields with Applications in Image Analysis }, JOURNAL = { Journal of Heuristics }, YEAR = { 2015 }, VOLUME = { 21 }, NUMBER = { 1 }, PAGES = { 25--45 }, MONTH = { Feb. }, ABSTRACT = { The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain. }, DOI = { 10.1007/s10732-014-9270-1 }, FILE = { :tran_phung_venkatesh_jh15 - Tree Based Iterated Local Search for Markov Random Fields with Applications in Image Analysis.pdf:PDF }, KEYWORDS = { Iterated local search, Strong local search, Belief propagation, Markov random fields, MAP assignment }, OWNER = { tund }, PUBLISHER = { Springer }, TIMESTAMP = { 2014.10.14 }, URL = { http://link.springer.com/article/10.1007%2Fs10732-014-9270-1 }, }

Tensor-variate Restricted Boltzmann Machines
Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In 29th AAAI Conference on Artificial Intelligence (AAAI), pages 2887-2893, Austin Texas, USA, January 2015. [ | | pdf]
Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance.

@INPROCEEDINGS { nguyen_tran_phung_venkatesh_aaai15, AUTHOR = { Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, TITLE = { Tensor-variate Restricted {B}oltzmann Machines }, BOOKTITLE = { 29th AAAI Conference on Artificial Intelligence (AAAI) }, YEAR = { 2015 }, PAGES = { 2887--2893 }, ADDRESS = { Austin Texas, USA }, MONTH = { January }, ABSTRACT = { Restricted Boltzmann Machines (RBMs) are an important class of latentvariable models for representing vector data. An under-explored areais multimode data, where each data point is a matrix or a tensor.Standard RBMs applying to such data would require vectorizing matricesand tensors, thus resulting in unnecessarily high dimensionalityand at the same time, destroying the inherent higher-order interactionstructures. This paper introduces Tensor-variate Restricted BoltzmannMachines (TvRBMs) which generalize RBMs to capture the multiplicativeinteraction between data modes and the latent variables. TvRBMs arehighly compact in that the number of free parameters grows only linearwith the number of modes. We demonstrate the capacity of TvRBMs onthree real-world applications: handwritten digit classification,face recognition and EEG-based alcoholic diagnosis. The learnt featuresof the model are more discriminative than the rivals, resulting inbetter classification performance. }, FILE = { :nguyen_tran_phung_venkatesh_aaai15 - Tensor Variate Restricted Boltzmann Machines.pdf:PDF }, KEYWORDS = { tensor; rbm; restricted boltzmann machine; tvrbm; multiplicative interaction; eeg; }, OWNER = { ngtu }, TIMESTAMP = { 2015.01.29 }, URL = { http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/download/9371/9956 }, }

Continuous discovery of co-location contexts from Bluetooth data
Nguyen, T., Gupta, S., Venkatesh, S. and Phung, D.. Pervasive and Mobile Computing (PMC), 16(B):286 - 304, Jan. 2015. [ | | pdf]
The discovery of context is important for context-aware applications in pervasive computing. This problem is challenging because of the stream nature of data, the complexity and changing nature of contexts. We propose a Bayesiannonparametric model for the detection of co-location contexts from Bluetooth signals. By using an Indian buffet process as the prior distribution, the model can discover the number of contexts automatically. We introduce a novel fixed-lag particle filter that process data incrementally. This sampling scheme is especially suitable for pervasive computing as the computational requirements remain constant in spite of growing data. We examine our model on a synthetic dataset and two real world datasets. To verify the discovered contexts, we compare them to the communities detected by the Louvain method, showing a strong correlation between the results of the two methods. The fixed-lag particle filter is compared with the Gibbs sampling in terms of the normalized factorization error that shows a close performance between the two inference methods. As the fixed-lag particle filter process a small chunk of data when it comes and does not need to be restarted, its execution time is significantly shorter than that of the Gibbs sampling.

@ARTICLE { nguyen_gupta_venkatesh_phung_pmc15, AUTHOR = { Nguyen, T. and Gupta, S. and Venkatesh, S. and Phung, D. }, TITLE = { Continuous discovery of co-location contexts from {B}luetooth data }, JOURNAL = { Pervasive and Mobile Computing (PMC) }, YEAR = { 2015 }, VOLUME = { 16 }, NUMBER = { B }, PAGES = { 286 - 304 }, MONTH = { Jan. }, ISSN = { 1574-1192 }, ABSTRACT = { The discovery of context is important for context-aware applications in pervasive computing. This problem is challenging because of the stream nature of data, the complexity and changing nature of contexts. We propose a Bayesiannonparametric model for the detection of co-location contexts from Bluetooth signals. By using an Indian buffet process as the prior distribution, the model can discover the number of contexts automatically. We introduce a novel fixed-lag particle filter that process data incrementally. This sampling scheme is especially suitable for pervasive computing as the computational requirements remain constant in spite of growing data. We examine our model on a synthetic dataset and two real world datasets. To verify the discovered contexts, we compare them to the communities detected by the Louvain method, showing a strong correlation between the results of the two methods. The fixed-lag particle filter is compared with the Gibbs sampling in terms of the normalized factorization error that shows a close performance between the two inference methods. As the fixed-lag particle filter process a small chunk of data when it comes and does not need to be restarted, its execution time is significantly shorter than that of the Gibbs sampling. }, DOI = { 10.1016/j.pmcj.2014.12.005 }, FILE = { :nguyen_gupta_venkatesh_phung_pmc15 - Continuous Discovery of Co Location Contexts from Bluetooth Data.pdf:PDF }, KEYWORDS = { Nonparametric, Indian buffet process, Incremental, Particle filter, Co-location context }, OWNER = { Thuong Nguyen }, PUBLISHER = { Elsevier }, TIMESTAMP = { 2014.12.18 }, URL = { http://www.sciencedirect.com/science/article/pii/S1574119214001941 }, }

Visual Object Clustering via Mixed-Norm Regularization
Zhang, Xin, Pham, Duc-Son, Phung, Dinh, Liu, Wanquan, Saha, Budhaditya and Venkatesh, Svetha. In Winter Conference on Applications of Computer Vision (WACV), pages 1030-1037, Jan. 2015. [ | | pdf]
Many vision problems deal with high-dimensional data, such as motion segmentation and face clustering. However, these high-dimensional data usually lie in a low-dimensional structure. Sparse representation is a powerful principle for solving a number of clustering problems with high-dimensional data. This principle is motivated from an ideal modeling of data points according to linear algebra theory. However, real data in computer vision are unlikely to follow the ideal model perfectly. In this paper, we exploit the mixed norm regularization for sparse subspace clustering. This regularization term is a convex combination of the ℓ1 norm, which promotes sparsity at the individual level and the block norm ℓ2/1 which promotes group sparsity. Combining these powerful regularization terms will provide a more accurate modeling, subsequently leading to a better solution for the affinity matrix used in sparse subspace clustering. This could help us achieve better performance on motion segmentation and face clustering problems. This formulation also caters for different types of data corruptions. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other state-of-arts on both motion segmentation and face clustering.

@INPROCEEDINGS { zhang_pham_phung_liu_budhaditya_venkatesh_wacv15, AUTHOR = { Zhang, Xin and Pham, Duc-Son and Phung, Dinh and Liu, Wanquan and Saha, Budhaditya and Venkatesh, Svetha }, TITLE = { Visual Object Clustering via Mixed-Norm Regularization }, BOOKTITLE = { Winter Conference on Applications of Computer Vision (WACV) }, YEAR = { 2015 }, PAGES = { 1030--1037 }, MONTH = { Jan. }, ABSTRACT = { Many vision problems deal with high-dimensional data, such as motion segmentation and face clustering. However, these high-dimensional data usually lie in a low-dimensional structure. Sparse representation is a powerful principle for solving a number of clustering problems with high-dimensional data. This principle is motivated from an ideal modeling of data points according to linear algebra theory. However, real data in computer vision are unlikely to follow the ideal model perfectly. In this paper, we exploit the mixed norm regularization for sparse subspace clustering. This regularization term is a convex combination of the ℓ1 norm, which promotes sparsity at the individual level and the block norm ℓ2/1 which promotes group sparsity. Combining these powerful regularization terms will provide a more accurate modeling, subsequently leading to a better solution for the affinity matrix used in sparse subspace clustering. This could help us achieve better performance on motion segmentation and face clustering problems. This formulation also caters for different types of data corruptions. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other state-of-arts on both motion segmentation and face clustering. }, DOI = { 10.1109/WACV.2015.142 }, FILE = { :zhang_pham_phung_liu_budhaditya_venkatesh_wacv15 - Visual Object Clustering Via Mixed Norm Regularization.pdf:PDF }, KEYWORDS = { computer vision;image segmentation;matrix algebra;pattern clustering;alternating direction method of multipliers framework;computer vision;face clustering problems;linear algebra theory;mixed-norm regularization;motion segmentation;sparse representation;sparse subspace clustering;visual object clustering problem;Clustering algorithms;Computer vision;Data models;Educational institutions;Face;Motion segmentation;Sparse matrices }, OWNER = { Dinh }, TIMESTAMP = { 2015.02.03 }, URL = { http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7045996 }, }

Web search activity data accurately predicts population chronic disease risk in the United States
Nguyen, Thin, Tran, Truyen, Luo, Wei, Gupta, Sunil, Rana, Santu, Phung, Dinh, Nichols, Melanie, Millar, Lynne, Venkatesh, Svetha and Allender, Steve. Journal of Epidemiology \& Community Health, 69(7):693-699, Jan. 2015. [ | | pdf]
Background The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors.Methods Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r.Results For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93.Conclusions The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts.

@ARTICLE { nguyen_tran_luo_gupta_rana_phung_nichols_millar_venkatesh_allender_jech15, AUTHOR = { Nguyen, Thin and Tran, Truyen and Luo, Wei and Gupta, Sunil and Rana, Santu and Phung, Dinh and Nichols, Melanie and Millar, Lynne and Venkatesh, Svetha and Allender, Steve }, TITLE = { Web search activity data accurately predicts population chronic disease risk in the {U}nited {S}tates }, JOURNAL = { Journal of Epidemiology \& Community Health }, YEAR = { 2015 }, VOLUME = { 69 }, NUMBER = { 7 }, PAGES = { 693--699 }, MONTH = { Jan. }, ISSN = { 1949-3045 }, ABSTRACT = { Background The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors.Methods Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r.Results For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93.Conclusions The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts. }, DOI = { 10.1136/jech-2014-204523 }, FILE = { :nguyen_tran_luo_gupta_rana_phung_nichols_millar_venkatesh_allender_jech15 - Web Search Activity Data Accurately Predicts Population Chronic Disease Risk in the United States.pdf:PDF }, OWNER = { thinng }, TIMESTAMP = { 2015.01.28 }, URL = { http://jech.bmj.com/content/69/7/693.abstract }, }

2014

A Random Finite Set Model for Data Clustering
Phung, D. and Vo, B.N.. In Proceedings of International Conference on Fusion (FUSION), Salamanca, Spain, July 2014. [ | | pdf]
Abstract--- The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data.

@CONFERENCE { phung_vo_fusion14, TITLE = { A Random Finite Set Model for Data Clustering }, AUTHOR = { Phung, D. and Vo, B.N. }, BOOKTITLE = { Proceedings of International Conference on Fusion (FUSION) }, YEAR = { 2014 }, ADDRESS = { Salamanca, Spain }, MONTH = { July }, ABSTRACT = { Abstract--- The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data. }, OWNER = { dinh }, TIMESTAMP = { 2014.05.16 }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/phung_vo_fusion14.pdf }, }

Learning Latent Activities from Social Signals with Hierarchical Dirichlet Process
Phung, D., Nguyen, T. C., Gupta, S. and Venkatesh, S.. In Handbook on Plan, Activity, and Intent Recognition, pages 149-174.Elsevier, , 2014. [ | | pdf | code]
Understanding human activities is an important research topic, noticeably in assisted living and health monitoring. Beyond simple forms of activity (e.g., RFID event of entering a building), learning latent activities that are more semantically interpretable, such as sitting at a desk, meeting with people or gathering with friends, remains a challenging problem. Supervised learning has been the typical modeling choice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adapt to the continuing growth of data over time. In this chapter, we explore Bayesian nonparametric method, in particular the Hierarchical Dirichlet Process, to infer latent activities from sensor data acquired in a pervasive setting. Our framework is unsupervised, requires no labeled data and is able to discover new activities as data grows. We present experiments on extracting movement and interaction activities from sociometric badge signals and show how to use them for detection of sub-communities. Using the popular Reality Mining dataset, we further demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups.

@INCOLLECTION { phung_nguyen_gupta_venkatesh_pair14, TITLE = { Learning Latent Activities from Social Signals with Hierarchical {D}irichlet Process }, AUTHOR = { Phung, D. and Nguyen, T. C. and Gupta, S. and Venkatesh, S. }, BOOKTITLE = { Handbook on Plan, Activity, and Intent Recognition }, PUBLISHER = { Elsevier }, YEAR = { 2014 }, EDITOR = { Gita Sukthankar and Christopher Geib and David V. Pynadath and Hung Bui and Robert P. Goldman }, PAGES = { 149--174 }, ABSTRACT = { Understanding human activities is an important research topic, noticeably in assisted living and health monitoring. Beyond simple forms of activity (e.g., RFID event of entering a building), learning latent activities that are more semantically interpretable, such as sitting at a desk, meeting with people or gathering with friends, remains a challenging problem. Supervised learning has been the typical modeling choice in the past. However, this requires labeled training data, is unable to predict never-seen-before activity and fails to adapt to the continuing growth of data over time. In this chapter, we explore Bayesian nonparametric method, in particular the Hierarchical Dirichlet Process, to infer latent activities from sensor data acquired in a pervasive setting. Our framework is unsupervised, requires no labeled data and is able to discover new activities as data grows. We present experiments on extracting movement and interaction activities from sociometric badge signals and show how to use them for detection of sub-communities. Using the popular Reality Mining dataset, we further demonstrate the extraction of co-location activities and use them to automatically infer the structure of social subgroups. }, CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code }, OWNER = { ctng }, TIMESTAMP = { 2013.07.25 }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/Phung_etal_pair14.pdf }, }

Proceedings of the Sixth Asian Conference on Machine Learning
Phung, Dinh and Li, Hang, editor. volume 39 of JMLR Workshop and Conference Proceedings, JMLR, Nov. 2014. [ | | pdf]

@PROCEEDINGS { phung_li_acml14proceedings, TITLE = { Proceedings of the Sixth Asian Conference on Machine Learning }, YEAR = { 2014 }, EDITOR = { Phung, Dinh and Li, Hang }, MONTH = { Nov. }, PUBLISHER = { JMLR }, SERIES = { JMLR Workshop and Conference Proceedings }, VOLUME = { 39 }, LOCATION = { Nha Trang, Vietnam }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2016.04.11 }, URL = { http://jmlr.org/proceedings/papers/v39/ }, }

Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
Nguyen, V., Phung, D., Venkatesh, S. Nguyen, X.L. and Bui, H.. In Proc. of International Conference on Machine Learning (ICML), pages 288-296, Beijing, China, 2014. [ | ]
We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polyaurn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains.

@INPROCEEDINGS { nguyen_phung_nguyen_venkatesh_bui_icml14, TITLE = { {B}ayesian Nonparametric Multilevel Clustering with Group-Level Contexts }, AUTHOR = { Nguyen, V. and Phung, D. and Venkatesh, S. Nguyen, X.L. and Bui, H. }, BOOKTITLE = { Proc. of International Conference on Machine Learning (ICML) }, YEAR = { 2014 }, ADDRESS = { Beijing, China }, PAGES = { 288--296 }, ABSTRACT = { We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polyaurn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains. }, OWNER = { tvnguye }, TIMESTAMP = { 2013.12.13 }, }

Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter
Vo, B-N, Vo, B-T and Phung, Dinh. IEEE Transactions on Signal Processing, 62(24):6554-6567, 2014. [ | ]

@ARTICLE { vo_vo_phung_isp14, TITLE = { Labeled Random Finite Sets and the Bayes Multi-target Tracking Filter }, AUTHOR = { Vo, B-N and Vo, B-T and Phung, Dinh }, JOURNAL = { IEEE Transactions on Signal Processing }, YEAR = { 2014 }, NUMBER = { 24 }, PAGES = { 6554--6567 }, VOLUME = { 62 }, OWNER = { dinh }, TIMESTAMP = { 2014.07.02 }, }

Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions
Gupta, S., Rana, S., Phung, D. and Venkatesh, S.. In Proc. of SIAM Int. Conference on Data Mining (SDM) (accepted), Philadelphia, Pennsylvania, USA, April 2014. [ | ]

@INPROCEEDINGS { gupta_rana_phung_venkatesh_sdm14, TITLE = { Keeping up with Innovation: A Predictive Framework for Modeling Healthcare Data with Evolving Clinical Interventions }, AUTHOR = { Gupta, S. and Rana, S. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proc. of SIAM Int. Conference on Data Mining (SDM) (accepted) }, YEAR = { 2014 }, ADDRESS = { Philadelphia, Pennsylvania, USA }, MONTH = { April }, OWNER = { Thuongnc }, TIMESTAMP = { 2014.01.05 }, }

Stabilized Sparse Ordinal Regression for Medical Risk Stratification
Truyen Tran, Dinh Phung, Wei Luo and Svetha Venkatesh. Knowledge and Information Systems (KAIS), 2014. [ | ]
The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability.

@ARTICLE { tran_phung_luo_venkatesh_kais14, TITLE = { Stabilized Sparse Ordinal Regression for Medical Risk Stratification }, AUTHOR = { Truyen Tran and Dinh Phung and Wei Luo and Svetha Venkatesh }, JOURNAL = { Knowledge and Information Systems (KAIS) }, YEAR = { 2014 }, PAGES = { (accepted for publication on 17 Jan 2014) }, ABSTRACT = { The recent wide adoption of Electronic Medical Records (EMR) presents great opportunities and challenges for data mining. The EMR data is largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large-margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability. }, OWNER = { dinh }, TIMESTAMP = { 2014.01.28 }, }

Tree-based Iterated Local Search for Markov Random Fields with Applications in Image Analysis
Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. Journal of Heuristics, 2015. [ | | pdf]
The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain.

@ARTICLE { tran_phung_venkatesh_jh14, TITLE = { Tree-based Iterated Local Search for Markov Random Fields with Applications in Image Analysis }, AUTHOR = { Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, JOURNAL = { Journal of Heuristics }, YEAR = { 2015 }, PAGES = { accepted on 8 Nov 2014 }, ABSTRACT = { The maximum a posteriori assignment for general structure Markov random fields is computationally intractable. In this paper, we exploit tree-based methods to efficiently address this problem. Our novel method, named Tree-based Iterated Local Search (T-ILS), takes advantage of the tractability of tree-structures embedded within MRFs to derive strong local search in an ILS framework. The method efficiently explores exponentially large neighborhoods using a limited memory without any requirement on the cost functions. We evaluate the T-ILS on a simulated Ising model and two real-world vision problems: stereo matching and image denoising. Experimental results demonstrate that our methods are competitive against state-of-the-art rivals with significant computational gain. }, OWNER = { tund }, PUBLISHER = { Springer }, TIMESTAMP = { 2014.10.14 }, URL = { http://link.springer.com/article/10.1007%2Fs10732-014-9270-1 }, }

Stabilizing Sparse Cox Model using Clinical Structures in Electronic Medical Records
Gopakumar, Shivapratap, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Proceedings of the Second International Workshop on Pattern Recognition for Healthcare Analytics, 2014. [ | | pdf]
Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in highdimensional data which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using clinical structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using a feature graph derived from two types of EMR structures: temporal structure of disease and intervention recurrences, and hierarchical structure of medical knowledge and practices. We demonstrate the efficacy of the method in predicting time-to-readmission of heart failure patients. On two stability measures – the Jaccard index and the Consistency index – the use of clinical structures significantly increased feature stability without hurting discriminative power. Our model reported a competitive AUC of 0.64 (95% CIs: [0.58,0.69]) for 6 months prediction.

@INPROCEEDINGS { gopakumar_tran_phung_venkatesh_icpr_ws14, TITLE = { Stabilizing Sparse Cox Model using Clinical Structures in Electronic Medical Records }, AUTHOR = { Gopakumar, Shivapratap and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Proceedings of the Second International Workshop on Pattern Recognition for Healthcare Analytics }, YEAR = { 2014 }, ABSTRACT = { Stability in clinical prediction models is crucial for transferability between studies, yet has received little attention. The problem is paramount in highdimensional data which invites sparse models with feature selection capability. We introduce an effective method to stabilize sparse Cox model of time-to-events using clinical structures inherent in Electronic Medical Records (EMR). Model estimation is stabilized using a feature graph derived from two types of EMR structures: temporal structure of disease and intervention recurrences, and hierarchical structure of medical knowledge and practices. We demonstrate the efficacy of the method in predicting time-to-readmission of heart failure patients. On two stability measures – the Jaccard index and the Consistency index – the use of clinical structures significantly increased feature stability without hurting discriminative power. Our model reported a competitive AUC of 0.64 (95% CIs: [0.58,0.69]) for 6 months prediction. }, URL = { https://sites.google.com/site/iwprha2/proceedings }, }

Individualized Arrhythmia Detection with ECG Signals from Wearable Devices
Nguyen, Thanh-Binh, Luo, Wei, Caelli, Terry, Venkatesh, Svetha and Phung, Dinh. In The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014), Shanghai,China, 2014. [ | ]
Low cost pervasive electrocardiogram (ECG) monitors is changing how sinus arrhythmia are diagnosed among patients with mild symptoms. With the large amount of data generated from long-term monitoring, come new data science and analytical challenges. Although traditional rule-based detection algorithms still work on relatively short clinical quality ECG, they are not optimal for pervasive signals collected from wearable devices—they don’t adapt to individual difference and assume accurate identification of ECG fiducial points. To overcome these short-comings of the rule-based methods, this paper introduces an arrhythmia detection approach for low quality pervasive ECG signals. To achieve the robustness needed, two techniques were applied. First, a set of ECG features with minimal reliance on fiducial point identification were selected. Next, the features were normalized using robust statistics to factors out baseline individual differences and clinically irrelevant temporal drift that is common in pervasive ECG. The proposed method was evaluated using pervasive ECG signals we collected, in combination with clinician validated ECG signals from Physiobank. Empirical evaluation confirms accuracy improvements of the proposed approach over the traditional clinical rules.

@INPROCEEDINGS { nguyen_luo_caelli_venkatesh_phung_dsaa14, TITLE = { Individualized Arrhythmia Detection with ECG Signals from Wearable Devices }, AUTHOR = { Nguyen, Thanh-Binh and Luo, Wei and Caelli, Terry and Venkatesh, Svetha and Phung, Dinh }, BOOKTITLE = { The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014) }, YEAR = { 2014 }, ADDRESS = { Shanghai,China }, ABSTRACT = { Low cost pervasive electrocardiogram (ECG) monitors is changing how sinus arrhythmia are diagnosed among patients with mild symptoms. With the large amount of data generated from long-term monitoring, come new data science and analytical challenges. Although traditional rule-based detection algorithms still work on relatively short clinical quality ECG, they are not optimal for pervasive signals collected from wearable devices—they don’t adapt to individual difference and assume accurate identification of ECG fiducial points. To overcome these short-comings of the rule-based methods, this paper introduces an arrhythmia detection approach for low quality pervasive ECG signals. To achieve the robustness needed, two techniques were applied. First, a set of ECG features with minimal reliance on fiducial point identification were selected. Next, the features were normalized using robust statistics to factors out baseline individual differences and clinically irrelevant temporal drift that is common in pervasive ECG. The proposed method was evaluated using pervasive ECG signals we collected, in combination with clinician validated ECG signals from Physiobank. Empirical evaluation confirms accuracy improvements of the proposed approach over the traditional clinical rules. }, COMMENT = { coauthor }, OWNER = { dbdao }, TIMESTAMP = { 2014.08.21 }, }

Unsupervised Inference of Significant Locations from WiFi Data for Understanding Human Dynamics
Nguyen, Thanh-Binh, Nguyen, Thuong C., Luo, Wei, Venkatesh , Svetha and Phung, Dinh. In The 13th International Conference on Mobile and Ubiquitous Multimedia (MUM2014), pages 232-235, 2014. [ | | pdf]
Motion and location are essential to understand human dynamics. This paper presents a method to discover significant locations and daily routines of individuals from WiFi data, which is considered more suitable for study of human dynamics than GPS data. Our method determines significant locations by clustering access points in close proximity using the Affinity Propagation algorithm, which has the advantage of automatically determining the number of locations. We demonstrate our method on the MDC dataset that includes more than 30 million WiFi scans. The experimental results show good clustering performance and also superior temporal coverage in comparison to a multimodal approach on the same dataset. From the discovered location trajectories, we can learn interesting mobility patterns of mobile phone users. The human dynamics of participants is reflected through the entropy of the location distributions which shows interesting correlation with the age and occupations of users.

@INPROCEEDINGS { nguyen_nguyen_lou_venkatesh_phung_mum14, TITLE = { Unsupervised Inference of Significant Locations from WiFi Data for Understanding Human Dynamics }, AUTHOR = { Nguyen, Thanh-Binh and Nguyen, Thuong C. and Luo, Wei and Venkatesh , Svetha and Phung, Dinh }, BOOKTITLE = { The 13th International Conference on Mobile and Ubiquitous Multimedia (MUM2014) }, YEAR = { 2014 }, PAGES = { 232--235 }, ABSTRACT = { Motion and location are essential to understand human dynamics. This paper presents a method to discover significant locations and daily routines of individuals from WiFi data, which is considered more suitable for study of human dynamics than GPS data. Our method determines significant locations by clustering access points in close proximity using the Affinity Propagation algorithm, which has the advantage of automatically determining the number of locations. We demonstrate our method on the MDC dataset that includes more than 30 million WiFi scans. The experimental results show good clustering performance and also superior temporal coverage in comparison to a multimodal approach on the same dataset. From the discovered location trajectories, we can learn interesting mobility patterns of mobile phone users. The human dynamics of participants is reflected through the entropy of the location distributions which shows interesting correlation with the age and occupations of users. }, DOI = { 2677972.2677997 }, FILE = { :papers\\activityrecognition\\nguyen_nguyen_lou_venkatesh_phung_mum14.pdf:PDF }, OWNER = { Thanh-Binh Nguyen }, TIMESTAMP = { 2014.10.20 }, URL = { http://dl.acm.org/citation.cfm?id=2677972.2677997&coll=DL&dl=ACM&CFID=590574626&CFTOKEN=81216827 }, }

Analysis of Circadian Rhythms from Online Communities of Individuals with Affective Disorders
Dao, Bo, Nguyen, Thin, Phung, Dinh and Venkatesh, Svetha. In The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014), Shanghai,China, 2014. [ | ]

@INPROCEEDINGS { dao_nguyen_phung_venkatesh_dsaa14, TITLE = { Analysis of Circadian Rhythms from Online Communities of Individuals with Affective Disorders }, AUTHOR = { Dao, Bo and Nguyen, Thin and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { The 2014 International Conference on Data Science and Advanced Analytics (DSAA2014) }, YEAR = { 2014 }, ADDRESS = { Shanghai,China }, COMMENT = { coauthor }, OWNER = { dbdao }, TIMESTAMP = { 2014.08.21 }, }

Topic Model Kernel Classification With Probabilistically Reduced Features
V. Nguyen, D. Phung and S. Venkatesh. Journal of Data Science, 2014. [ | ]

@ARTICLE { nguyen_phung_venkatesh_jds14, TITLE = { Topic Model Kernel Classification With Probabilistically Reduced Features }, AUTHOR = { V. Nguyen and D. Phung and S. Venkatesh }, JOURNAL = { Journal of Data Science }, YEAR = { 2014 }, PAGES = { accepted on 27/10/2014 }, OWNER = { tvnguye }, TIMESTAMP = { 2014.11.03 }, }

Affective and Content Analysis of Online Depression Communities
Thin Nguyen, Dinh Phung, Bo Dao, Svetha Venkatesh and Michael Berk. IEEE Transactions on Affective Computing, 2014. [ | | pdf]
A large number of people use online communities to discuss mental health issues, thus offering opportunities for new understanding of these communities. This paper aims to study the characteristics of online depression communities (CLINICAL) in comparison with those joining other online communities (CONTROL). We use machine learning and statistical methods to discriminate online messages between depression and control communities using mood, psycholinguistic processes and content topics extracted from the posts generated by members of these communities. All aspects including mood, the written content and writing style are found to be significantly different between two types of communities. Sentiment analysis shows the clinical group have lower valence than people in the control group. For language styles and topics, statistical tests reject the hypothesis of equality on psycholinguistic processes and topics between two groups. We show good predictive validity in depression classification using topics and psycholinguistic clues as features. Clear discrimination between writing styles and contents, with good predictive power is an important step in understanding social media and its use in mental health.

@ARTICLE { nguyen_phung_dao_venkatesh_berk_tac14, TITLE = { Affective and Content Analysis of Online Depression Communities }, AUTHOR = { Thin Nguyen and Dinh Phung and Bo Dao and Svetha Venkatesh and Michael Berk }, JOURNAL = { IEEE Transactions on Affective Computing }, YEAR = { 2014 }, PAGES = { (to appear) }, ABSTRACT = { A large number of people use online communities to discuss mental health issues, thus offering opportunities for new understanding of these communities. This paper aims to study the characteristics of online depression communities (CLINICAL) in comparison with those joining other online communities (CONTROL). We use machine learning and statistical methods to discriminate online messages between depression and control communities using mood, psycholinguistic processes and content topics extracted from the posts generated by members of these communities. All aspects including mood, the written content and writing style are found to be significantly different between two types of communities. Sentiment analysis shows the clinical group have lower valence than people in the control group. For language styles and topics, statistical tests reject the hypothesis of equality on psycholinguistic processes and topics between two groups. We show good predictive validity in depression classification using topics and psycholinguistic clues as features. Clear discrimination between writing styles and contents, with good predictive power is an important step in understanding social media and its use in mental health. }, DOI = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6784326 }, OWNER = { thinng }, TIMESTAMP = { 2014.03.31 }, URL = { http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6784326 }, }

Regularizing Topic Discovery in EMRs with Side Information by Using Hierarchical Bayesian Models
Li, C., Rana, S. and Phung, D.and Venkatesh, S.. In Proceedings of International Conference on Pattern Recognition (ICPR) (accepted), 2014. [ | ]
Abstract--- We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wddCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measure.

@INPROCEEDINGS { li_rana_phung_venkatesh_icpr14, TITLE = { Regularizing Topic Discovery in EMRs with Side Information by Using Hierarchical Bayesian Models }, AUTHOR = { Li, C. and Rana, S. and Phung, D.and Venkatesh, S. }, BOOKTITLE = { Proceedings of International Conference on Pattern Recognition (ICPR) (accepted) }, YEAR = { 2014 }, ABSTRACT = { Abstract--- We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wddCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measure. }, OWNER = { chengl }, TIMESTAMP = { 2014.03.27 }, }

Effect of Mood, Social Connectivity and Age in Online Depression Community via Topic and Linguistic Analysis
Dao, Bo, Nguyen, Thin, Phung, Dinh and Venkatesh, Svetha. In International Conference on Web Information System Engineering (WISE 2014), Thessaloniki, Greece, 2014. [ | | pdf ?]
Depression afflicts one in four people during their lives. Several studies have shown that for the isolated and mentally ill, the Web and social media provide effective platforms for supports and treatments as well as to acquire scientific, clinical understanding of this mental condition. More and more individuals affected by depression join online communities to seek for information, express themselves, share their concerns and look for supports [11]. For the first time, we collect and study a large online depression community of more than 12,000 active members from Live Journal. We examine the effect of mood, social connectivity and age on the online messages authored by members in an online depression community. The posts are considered in two aspects: what is written (topic) and how it is written (language style). We use statistical and machine learning methods to discriminate the posts made by bloggers in low versus high valence mood, in different age categories and in different degrees of social connectivity. Using statistical tests, language styles are found to be significantly different between low and high valence cohorts, whilst topics are significantly different between people whose different degrees of social connectivity. High performance is achieved for low versus high valence post classification using writing style as features. The finding suggests the potential of using social media in depression screening, especially in online setting.

@INPROCEEDINGS { dao_nguyen_phung_venkatesh_wise14, TITLE = { Effect of Mood, Social Connectivity and Age in Online Depression Community via Topic and Linguistic Analysis }, AUTHOR = { Dao, Bo and Nguyen, Thin and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { International Conference on Web Information System Engineering (WISE 2014) }, YEAR = { 2014 }, ADDRESS = { Thessaloniki, Greece }, ABSTRACT = { Depression afflicts one in four people during their lives. Several studies have shown that for the isolated and mentally ill, the Web and social media provide effective platforms for supports and treatments as well as to acquire scientific, clinical understanding of this mental condition. More and more individuals affected by depression join online communities to seek for information, express themselves, share their concerns and look for supports [11]. For the first time, we collect and study a large online depression community of more than 12,000 active members from Live Journal. We examine the effect of mood, social connectivity and age on the online messages authored by members in an online depression community. The posts are considered in two aspects: what is written (topic) and how it is written (language style). We use statistical and machine learning methods to discriminate the posts made by bloggers in low versus high valence mood, in different age categories and in different degrees of social connectivity. Using statistical tests, language styles are found to be significantly different between low and high valence cohorts, whilst topics are significantly different between people whose different degrees of social connectivity. High performance is achieved for low versus high valence post classification using writing style as features. The finding suggests the potential of using social media in depression screening, especially in online setting. }, COMMENT = { coauthor }, OWNER = { dbdao }, TIMESTAMP = { 2014.07.11 }, URL = { 2014\conferences\dao_nguyen_phung_venkatesh_wise14.pdf }, }

Affective, Linguistic and Topic Patterns in Online Autism Communities
Nguyen, Thin, Duong, Thi, Phung, Dinh and Venkatesh, Svetha. In International Conference on Web Information System Engineering (WISE 2014), Thessaloniki, Greece, 2014. [ | | pdf ?]
Online communities offer a platform to support and discuss health issues. They provide a more accessible way to bring people of the same concerns or interests. This paper aims to study the characteristics of online autism communities (Clinical) in comparison with other online communities (Control) using data from 110 Live Journal weblog communities. Using machine learning techniques, we analyze these online autism communities comprehensively, studying three key aspects expressed in the blog posts made by members of the communities: sentiment, topics and language style. Sentiment analysis shows that the sentiment of the clinical group has lower valence, indicative of poorer moods than people in control. Topics and language style are shown to be good predictors of autism posts. The result shows the potential of social media in medical studies for a broad range of purposes such as screening, monitoring and subsequently providing supports for fragile communities.

@INPROCEEDINGS { nguyen_duong_phung_venkatesh_wise14, TITLE = { Affective, Linguistic and Topic Patterns in Online Autism Communities }, AUTHOR = { Nguyen, Thin and Duong, Thi and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { International Conference on Web Information System Engineering (WISE 2014) }, YEAR = { 2014 }, ADDRESS = { Thessaloniki, Greece }, ABSTRACT = { Online communities offer a platform to support and discuss health issues. They provide a more accessible way to bring people of the same concerns or interests. This paper aims to study the characteristics of online autism communities (Clinical) in comparison with other online communities (Control) using data from 110 Live Journal weblog communities. Using machine learning techniques, we analyze these online autism communities comprehensively, studying three key aspects expressed in the blog posts made by members of the communities: sentiment, topics and language style. Sentiment analysis shows that the sentiment of the clinical group has lower valence, indicative of poorer moods than people in control. Topics and language style are shown to be good predictors of autism posts. The result shows the potential of social media in medical studies for a broad range of purposes such as screening, monitoring and subsequently providing supports for fragile communities. }, COMMENT = { coauthor }, OWNER = { dbdao }, TIMESTAMP = { 2014.07.11 }, URL = { 2014\conferences\nguyen_duong_phung_venkatesh_wise14.pdf }, }

A Bayesian Nonparametric Framework for Activity Recognition using Accelerometer Data
Nguyen, T.C., Gupta, S., Venkatesh, S. and Phung, D.. In Proceedings of 22nd International Conference on Pattern Recognition (ICPR), pages 2017-2022, 2014. [ | ]
Monitoring daily physical activity of human plays an important role in preventing the diseases as well as improving health. In this paper, we demonstrate a framework for monitoring the physical activity level in daily life. We collect the data using accelerometer sensors in a realistic setting without any supervision. The ground truth of activities is provided by the participants themselves using an experience sampling application running on mobile phones. The original data is discretized by the hierarchical Dirichlet process (HDP) into different activity levels and the number of levels are inferred automatically. We validate the accuracy of the extracted patterns by using them for the multi-label classification of activities and demonstrate high performances in various standard evaluation metrics. We further show that the extracted patterns are highly correlated to the daily routine of the users.

@INPROCEEDINGS { nguyen_gupta_venkatesh_phung_icpr14, TITLE = { A {B}ayesian Nonparametric Framework for Activity Recognition using Accelerometer Data }, AUTHOR = { Nguyen, T.C. and Gupta, S. and Venkatesh, S. and Phung, D. }, BOOKTITLE = { Proceedings of 22nd International Conference on Pattern Recognition (ICPR) }, YEAR = { 2014 }, PAGES = { 2017--2022 }, ABSTRACT = { Monitoring daily physical activity of human plays an important role in preventing the diseases as well as improving health. In this paper, we demonstrate a framework for monitoring the physical activity level in daily life. We collect the data using accelerometer sensors in a realistic setting without any supervision. The ground truth of activities is provided by the participants themselves using an experience sampling application running on mobile phones. The original data is discretized by the hierarchical Dirichlet process (HDP) into different activity levels and the number of levels are inferred automatically. We validate the accuracy of the extracted patterns by using them for the multi-label classification of activities and demonstrate high performances in various standard evaluation metrics. We further show that the extracted patterns are highly correlated to the daily routine of the users. }, OWNER = { ctng }, TIMESTAMP = { 2014.02.21 }, }

Nonparametric Discovery of Learning Patterns and Autism Subgroups from Therapeutic Data
Vellanki, P., Duong, T., Venkatesh, S. and Phung, D.. In Proceedings of 22nd International Conference on Pattern Recognition (ICPR) (accepted), pages 1829-1833, 2014. [ | ]
Autism Spectrum Disorder (ASD) is growing at a staggering rate; but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to the insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Playpad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is correct specification of number of patterns in advance, which in our case is even more difficulty due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular which use Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and NMF. In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals.

@INPROCEEDINGS { vellanki_duong_venkatesh_phung_icpr14, TITLE = { Nonparametric Discovery of Learning Patterns and Autism Subgroups from Therapeutic Data }, AUTHOR = { Vellanki, P. and Duong, T. and Venkatesh, S. and Phung, D. }, BOOKTITLE = { Proceedings of 22nd International Conference on Pattern Recognition (ICPR) (accepted) }, YEAR = { 2014 }, PAGES = { 1829-1833 }, ABSTRACT = { Autism Spectrum Disorder (ASD) is growing at a staggering rate; but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to the insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Playpad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is correct specification of number of patterns in advance, which in our case is even more difficulty due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular which use Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and NMF. In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals. }, OWNER = { pvellank }, TIMESTAMP = { 2014.04.11 }, }

Risk stratification using data from electronic medical records better predicts suicide risks than clinician assessments
T. Tran, W. Luo, D. Phung, H. Richard, M. Berk, L. Kennedy and S. Venkatesh. BMC Psychiatry, 14(1):76, 2014. [ | | pdf]
Background To date, our ability to accurately identify patients at high risk from suicidal behaviour, and thus to target interventions, has been fairly limited. This study examined a large pool of factors that are potentially associated with suicide risk from the comprehensive electronic medical record (EMR) and to derive a predictive model for 1–6 month risk. Methods 7,399 patients undergoing suicide risk assessment were followed up for 180 days. The dataset was divided into a derivation and validation cohorts of 4,911 and 2,488 respectively. Clinicians used an 18-point checklist of known risk factors to divide patients into low, medium, or high risk. Their predictive ability was compared with a risk stratification model derived from the EMR data. The model was based on the continuation-ratio ordinal regression method coupled with lasso (which stands for least absolute shrinkage and selection operator). Results In the year prior to suicide assessment, 66.8% of patients attended the emergency department (ED) and 41.8% had at least one hospital admission. Administrative and demographic data, along with information on prior self-harm episodes, as well as mental and physical health diagnoses were predictive of high-risk suicidal behaviour. Clinicians using the 18-point checklist were relatively poor in predicting patients at high-risk in 3 months (AUC 0.58, 95% CIs: 0.50 – 0.66). The model derived EMR was superior (AUC 0.79, 95% CIs: 0.72 – 0.84). At specificity of 0.72 (95% CIs: 0.70-0.73) the EMR model had sensitivity of 0.70 (95% CIs: 0.56-0.83). Conclusion Predictive models applied to data from the EMR could improve risk stratification of patients presenting with potential suicidal behaviour. The predictive factors include known risks for suicide, but also other information relating to general health and health service utilisation. Keywords: Suicide risk; Electronic medical record; Predictive models

@ARTICLE { Tran_etal_bmc14, TITLE = { Risk stratification using data from electronic medical records better predicts suicide risks than clinician assessments }, AUTHOR = { T. Tran and W. Luo and D. Phung and H. Richard and M. Berk and L. Kennedy and S. Venkatesh }, JOURNAL = { BMC Psychiatry }, YEAR = { 2014 }, NUMBER = { 1 }, PAGES = { 76 }, VOLUME = { 14 }, ABSTRACT = { Background To date, our ability to accurately identify patients at high risk from suicidal behaviour, and thus to target interventions, has been fairly limited. This study examined a large pool of factors that are potentially associated with suicide risk from the comprehensive electronic medical record (EMR) and to derive a predictive model for 1–6 month risk. Methods 7,399 patients undergoing suicide risk assessment were followed up for 180 days. The dataset was divided into a derivation and validation cohorts of 4,911 and 2,488 respectively. Clinicians used an 18-point checklist of known risk factors to divide patients into low, medium, or high risk. Their predictive ability was compared with a risk stratification model derived from the EMR data. The model was based on the continuation-ratio ordinal regression method coupled with lasso (which stands for least absolute shrinkage and selection operator). Results In the year prior to suicide assessment, 66.8% of patients attended the emergency department (ED) and 41.8% had at least one hospital admission. Administrative and demographic data, along with information on prior self-harm episodes, as well as mental and physical health diagnoses were predictive of high-risk suicidal behaviour. Clinicians using the 18-point checklist were relatively poor in predicting patients at high-risk in 3 months (AUC 0.58, 95% CIs: 0.50 – 0.66). The model derived EMR was superior (AUC 0.79, 95% CIs: 0.72 – 0.84). At specificity of 0.72 (95% CIs: 0.70-0.73) the EMR model had sensitivity of 0.70 (95% CIs: 0.56-0.83). Conclusion Predictive models applied to data from the EMR could improve risk stratification of patients presenting with potential suicidal behaviour. The predictive factors include known risks for suicide, but also other information relating to general health and health service utilisation. Keywords: Suicide risk; Electronic medical record; Predictive models }, OWNER = { dinh }, PUBLISHER = { BioMed Central Ltd }, TIMESTAMP = { 2014.03.21 }, URL = { http://www.biomedcentral.com/1471-244X/14/76 }, }

Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry
S. Gupta, T. Tran, W. Luo, D. Phung, R.L. Kennedy, A. Broad, D. Campbell, D. Kipp, M. Singh, M. Khasraw, L. Matheson, D.M. Ashley and S. Venkatesh. BMJ Open Oncology, 4(3), 2014. [ | | pdf]
Objectives. Using the prediction of cancer outcome as a model, we have tested the hypothesis that through analysing routinely collecteddigital data contained in an electronic administrative record (EAR), using machine-learning techniques, we could enhance conventionalmethods in predicting clinical outcomes. Setting. A regional cancer centre in Australia. Participants Disease-specific data from a purpose-built cancer registry (Evaluation of Cancer Outcomes (ECO)) from 869 patients were used to predict survival at 6, 12 and 24 months. The model was validated with data from a further 94 patients, and results compared to the assessment of five specialist oncologists. Machine-learning prediction using ECO data was compared with that using EAR and a model combining ECO and EAR data. Primary and secondary outcome measures Survival prediction accuracy in terms of the area under the receiver operating characteristic curve (AUC). Results. The ECO model yielded AUCs of 0.87 (95% CI 0.848 to 0.890) at 6 months, 0.796 (95% CI 0.774 to 0.823) at 12 months and 0.764 (95% CI 0.737 to 0.789) at 24 months. Each was slightly better than the performance of the clinician panel. The model performed consistently across a range of cancers, including rare cancers. Combining ECO a and EAR data yielded better prediction than the ECO-based model (AUCs ranging from 0.757 to 0.997 for 6 months, AUCs from 0.689 to 0.988 for 12 months and AUCs from 0.713 to 0.973 for 24 months). The best prediction was for genitourinary, head and neck, lung, skin, andupper gastrointestinal tumours. Conclusions. Machine learning applied to information from a disease-specific (cancer) database and the EAR can be used to predict clinical outcomes. Importantly, the approach described made use of digital data that is already routinely collected but underexploited by clinical health systems.

@ARTICLE { Gupta_etal_bmj14, TITLE = { Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry }, AUTHOR = { S. Gupta and T. Tran and W. Luo and D. Phung and R.L. Kennedy and A. Broad and D. Campbell and D. Kipp and M. Singh and M. Khasraw and L. Matheson and D.M. Ashley and S. Venkatesh }, JOURNAL = { BMJ Open Oncology }, YEAR = { 2014 }, NUMBER = { 3 }, VOLUME = { 4 }, ABSTRACT = { Objectives. Using the prediction of cancer outcome as a model, we have tested the hypothesis that through analysing routinely collecteddigital data contained in an electronic administrative record (EAR), using machine-learning techniques, we could enhance conventionalmethods in predicting clinical outcomes. Setting. A regional cancer centre in Australia. Participants Disease-specific data from a purpose-built cancer registry (Evaluation of Cancer Outcomes (ECO)) from 869 patients were used to predict survival at 6, 12 and 24 months. The model was validated with data from a further 94 patients, and results compared to the assessment of five specialist oncologists. Machine-learning prediction using ECO data was compared with that using EAR and a model combining ECO and EAR data. Primary and secondary outcome measures Survival prediction accuracy in terms of the area under the receiver operating characteristic curve (AUC). Results. The ECO model yielded AUCs of 0.87 (95% CI 0.848 to 0.890) at 6 months, 0.796 (95% CI 0.774 to 0.823) at 12 months and 0.764 (95% CI 0.737 to 0.789) at 24 months. Each was slightly better than the performance of the clinician panel. The model performed consistently across a range of cancers, including rare cancers. Combining ECO a and EAR data yielded better prediction than the ECO-based model (AUCs ranging from 0.757 to 0.997 for 6 months, AUCs from 0.689 to 0.988 for 12 months and AUCs from 0.713 to 0.973 for 24 months). The best prediction was for genitourinary, head and neck, lung, skin, andupper gastrointestinal tumours. Conclusions. Machine learning applied to information from a disease-specific (cancer) database and the EAR can be used to predict clinical outcomes. Importantly, the approach described made use of digital data that is already routinely collected but underexploited by clinical health systems. }, DOI = { 10.1136/bmjopen-2013-004007 }, OWNER = { dinh }, TIMESTAMP = { 2014.03.21 }, URL = { http://bmjopen.bmj.com/content/4/3/e004007.abstract }, }

Fixed-lag Particle Filter for Continuous Context Discovery Using Indian Buffet Process
Nguyen, T. C., Gupta, S., Venkatesh, S. and Phung, D.. In 2014 IEEE International Conference on Pervasive Computing and Communications (PERCOM), pages 20-28, 2014. [ | ]
Exploiting context from stream data in pervasive environments remains a challenge. We aim to extract proximal context from Bluetooth stream data, using an incremental, Bayesian nonparametric framework that estimates the number of contexts automatically. Unlike current approaches that can only provide final proximal grouping, our method provides proximal grouping and membership of users over time. Additionally, it provides an efficient online inference. We construct co-location matrix over time using Bluetooth data. A Poisson-exponential model is used to factorize this matrix into a factor matrix, interpreted as proximal groups, and a coefficient matrix that indicates factor usage. The coefficient matrix follows the Indian Buffet Process prior, which estimates the number of factors automatically. The non-negativity and sparsity of factors are enforced by using the exponential distribution to generate the factors. We propose a fixed-lag particle filter algorithm to process data incrementally. We compare the incremental inference (particle filter) with full batch inference (Gibbs sampling) in terms of normalized factorization error and execution time. The normalized error obtained through our incremental inference is comparable to that of full batch inference, whilst it is more than 100 times faster. The discovered factors have similar meaning to the results of the Louvain method – a popular method for community detection.

@INPROCEEDINGS { nguyen_gupta_venkatesh_phung_percom14, TITLE = { Fixed-lag Particle Filter for Continuous Context Discovery Using {I}ndian Buffet Process }, AUTHOR = { Nguyen, T. C. and Gupta, S. and Venkatesh, S. and Phung, D. }, BOOKTITLE = { 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom) }, YEAR = { 2014 }, PAGES = { 20--28 }, ABSTRACT = { Exploiting context from stream data in pervasive environments remains a challenge. We aim to extract proximal context from Bluetooth stream data, using an incremental, Bayesian nonparametric framework that estimates the number of contexts automatically. Unlike current approaches that can only provide final proximal grouping, our method provides proximal grouping and membership of users over time. Additionally, it provides an efficient online inference. We construct co-location matrix over time using Bluetooth data. A Poisson-exponential model is used to factorize this matrix into a factor matrix, interpreted as proximal groups, and a coefficient matrix that indicates factor usage. The coefficient matrix follows the Indian Buffet Process prior, which estimates the number of factors automatically. The non-negativity and sparsity of factors are enforced by using the exponential distribution to generate the factors. We propose a fixed-lag particle filter algorithm to process data incrementally. We compare the incremental inference (particle filter) with full batch inference (Gibbs sampling) in terms of normalized factorization error and execution time. The normalized error obtained through our incremental inference is comparable to that of full batch inference, whilst it is more than 100 times faster. The discovered factors have similar meaning to the results of the Louvain method – a popular method for community detection. }, FILE = { :papers\\phung\\nguyen_gupta_venkatesh_phung_percom14.pdf:PDF }, OWNER = { ctng }, TIMESTAMP = { 2013.12.14 }, }

Intervention-Driven Predictive Framework for Modeling Healthcare Data
Rana, S., Gupta, S., Phung, D. and Venkatesh, S.. In Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (accepted), Tainan, Taiwan, May 2014. [ | ]

@INPROCEEDINGS { rana_gupta_phung_venkatesh_pakdd14, TITLE = { Intervention-Driven Predictive Framework for Modeling Healthcare Data }, AUTHOR = { Rana, S. and Gupta, S. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (accepted) }, YEAR = { 2014 }, ADDRESS = { Tainan, Taiwan }, MONTH = { May }, OWNER = { ctng }, TIMESTAMP = { 2014.01.05 }, }

2013

Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis
Phung, D., Gupta, S. K., Nguyen, T. and Venkatesh, S.. IEEE Transactions on Multimedia (TMM), 15:1316-1325, May 2013. [ | | pdf]
Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being.

@ARTICLE { phung_gupta_nguyen_venkatesh_tmm13, TITLE = { Connectivity, Online Social Capital and Mood: A Bayesian Nonparametric Analysis }, AUTHOR = { Phung, D. and Gupta, S. K. and Nguyen, T. and Venkatesh, S. }, JOURNAL = { IEEE Transactions on Multimedia (TMM) }, YEAR = { 2013 }, MONTH = { May }, PAGES = { 1316-1325 }, VOLUME = { 15 }, ABSTRACT = { Social capital indicative of community interaction and support is intrinsically linked to mental health. Increasing online presence is now the norm. Whilst social capital and its impact on social networks has been examined, its underlying connection to emotional response such as mood, has not been investigated. This paper studies this phenomena, revisiting the concept of “online social capital” in social media communities using measurable aspects of social participation and social support. We establish the link between online capital derived from social media and mood, demonstrating results for different cohorts of social capital and social connectivity. We use novel Bayesian nonparametric factor analysis to extract the shared and individual factors in mood transition across groups of users of different levels of connectivity, quantifying patterns and degree of mood transitions. Using more than 1.6 million users from LiveJournal, we show quantitatively that groups with lower social capital have fewer positive moods and more negative moods, than groups with higher social capital. We show similar effects in mood transitions. We establish a framework of how social media can be used as a barometer for mood. The significance lies in the importance of online social capital to mental well-being in overall. In establishing the link between mood and social capital in online communities, this work may suggest the foundation of new systems to monitor online mental well-being. }, ISSN = { 0219-1377 }, LANGUAGE = { English }, TIMESTAMP = { 2013.04.16 }, URL = { http://prada-research.net/~dinh/uploads/Main/HomePage/phung_gupta_nguyen_venkatesh_tmm13.pdf }, }

Bayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster)
Phung, D.. In International Conference on Bayesian Nonparametrics, Amsterdam, The Netherlands, June 10-14 2013. [ | | code | poster]
When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's prole and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction.

@INPROCEEDINGS { phung_bnp13, TITLE = { {B}ayesian Nonparametric Modelling of Correlated Data Sources and Applications (poster) }, AUTHOR = { Phung, D. }, BOOKTITLE = { International Conference on Bayesian Nonparametrics }, YEAR = { 2013 }, ADDRESS = { Amsterdam, The Netherlands }, MONTH = { June 10-14 }, ABSTRACT = { When one considers realistic multimodal data, covariates are rich, and yet tend to have a natural correlation with one another; for example: tags and their associated multimedia contents; patient's demographic information, medical history and drug usage; social user's prole and friendship network. The presence of rich and naturally correlated covariates calls for the need to model their correlation with nonparametric models, without reverting to making parametric assumptions. This paper presents a full Bayesian nonparametric approach to the problem of jointly clustering data sources and modeling their correlation. In our approach, we view context as distributions over some index space, governed by the topics discovered from the primary data source (content), and model both contents and contexts jointly. We impose a conditional structure in which contents provide the topics, upon which contexts are conditionally distributed. Distributions over topic parameters are modelled according to a Dirichlet processes (DP). Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing mechanism to induce conditional random mixture distributions on the context observation spaces. Loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice and will be again employed in this work. In typical hierarchical Bayesian style, we also provide the model in grouped data setting, where contents and contexts appear in groups (for example, a collection of text documents or images embedded in time or space). Our model can be viewed as a generalization of the hierarchical Dirichlet process (HDP) [2] and the recent nested Dirichlet process (nDP) [1]. We develop an auxiliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets and various real large-scale datasets with an emphasis on the application perspective of the models. In particular, we demonstrate a) an application in text modelling for modelling topics which are sensitive to time using the NIPS and PNAS dataset, b) an application of the model in computer vision for inferring local and global movement patterns using the MIT dataset consisting of real video data collected at a trac scene, c) an application on medical data analysis in which we model latent aspects of diseases, their progression together with the task of re-admission prediction. }, CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code }, OWNER = { dinh }, POSTER = { http://prada-research.net/~dinh/uploads/Main/Publications/A0_poster_BNP13.pdf }, TIMESTAMP = { 2013.03.01 }, }

Extraction of Latent Patterns and Contexts from Social Honest Signals Using Hierarchical Dirichlet Processes
Nguyen, Thuong, Phung, Dinh, Gupta, Sunil and Venkatesh, Svetha. In IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM), pages 47-55, 2013. [ | | pdf | code]
A fundamental task in pervasive computing is reliable acquisition of contexts from sensor data. This is crucial to the operation of smart pervasive systems and services so that they might behave efficiently and appropriately upon a given context. Simple forms of context can often be extracted directly from raw data. Equally important, or more, is the hidden context and pattern buried inside the data, which is more challenging to discover. Most of existing approaches borrow methods and techniques from machine learning, dominantly employ parametric unsupervised learning and clustering techniques. Being parametric, a severe drawback of these methods is the requirement to specify the number of latent patterns in advance. In this paper, we explore the use of Bayesian nonparametric methods, a recent data modelling framework in machine learning, to infer latent patterns from sensor data acquired in a pervasive setting. Under this formalism, nonparametric prior distributions are used for data generative process, and thus, they allow the number of latent patterns to be learned automatically and grow with the data - as more data comes in, the model complexity can grow to explain new and unseen patterns. In particular, we make use of the hierarchical Dirichlet processes (HDP) to infer atomic activities and interaction patterns from honest signals collected from sociometric badges. We show how data from these sensors can be represented and learned with HDP. We illustrate insights into atomic patterns learned by the model and use them to achieve high-performance clustering. We also demonstrate the framework on the popular Reality Mining dataset, illustrating the ability of the model to automatically infer typical social groups in this dataset. Finally, our framework is generic and applicable to a much wider range of problems in pervasive computing where one needs to infer high-level, latent patterns and contexts from sensor data.

@INPROCEEDINGS { nguyen_phung_gupta_venkatesh_percom13, AUTHOR = { Nguyen, Thuong and Phung, Dinh and Gupta, Sunil and Venkatesh, Svetha }, TITLE = { Extraction of Latent Patterns and Contexts from Social Honest Signals Using Hierarchical {D}irichlet Processes }, BOOKTITLE = { IEEE Intl. Conf. on Pervasive Computing and Communications (PERCOM) }, YEAR = { 2013 }, PAGES = { 47-55 }, ABSTRACT = { A fundamental task in pervasive computing is reliable acquisition of contexts from sensor data. This is crucial to the operation of smart pervasive systems and services so that they might behave efficiently and appropriately upon a given context. Simple forms of context can often be extracted directly from raw data. Equally important, or more, is the hidden context and pattern buried inside the data, which is more challenging to discover. Most of existing approaches borrow methods and techniques from machine learning, dominantly employ parametric unsupervised learning and clustering techniques. Being parametric, a severe drawback of these methods is the requirement to specify the number of latent patterns in advance. In this paper, we explore the use of Bayesian nonparametric methods, a recent data modelling framework in machine learning, to infer latent patterns from sensor data acquired in a pervasive setting. Under this formalism, nonparametric prior distributions are used for data generative process, and thus, they allow the number of latent patterns to be learned automatically and grow with the data - as more data comes in, the model complexity can grow to explain new and unseen patterns. In particular, we make use of the hierarchical Dirichlet processes (HDP) to infer atomic activities and interaction patterns from honest signals collected from sociometric badges. We show how data from these sensors can be represented and learned with HDP. We illustrate insights into atomic patterns learned by the model and use them to achieve high-performance clustering. We also demonstrate the framework on the popular Reality Mining dataset, illustrating the ability of the model to automatically infer typical social groups in this dataset. Finally, our framework is generic and applicable to a much wider range of problems in pervasive computing where one needs to infer high-level, latent patterns and contexts from sensor data. }, CODE = { http://prada-research.net/~dinh/index.php?n=Main.Code#HDP_code }, FILE = { :nguyen_phung_gupta_venkatesh_percom13 - Extraction of Latent Patterns and Contexts from Social Honest Signals Using Hierarchical Dirichlet Processes.pdf:PDF }, OWNER = { Phung, Dinh }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/Nguyen_etal_percom13.pdf }, }

Thurstonian Boltzmann Machines: Learning from Multiple Inequalities
Truyen T., Phung D. and Venkatesh, S.. In International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ]
We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis.

@INPROCEEDINGS { truyen_phung_venkatesh_icml13, TITLE = { {T}hurstonian {B}oltzmann Machines: Learning from Multiple Inequalities }, AUTHOR = { Truyen T. and Phung D. and Venkatesh, S. }, BOOKTITLE = { International Conference on Machine Learning (ICML) }, YEAR = { 2013 }, ADDRESS = { Atlanta, USA }, MONTH = { June 16-21 }, ABSTRACT = { We introduce Thurstonian Boltzmann Machines (TBM), a unified architecture that can naturally incorporate a wide range of data inputs at the same time. It is based on the observations that many data types can be considered as being generated from a subset of underlying continuous variables, and each input value signifies a several respective inequalities. Thus learning TBM is essentially learning to make sense of a set of inequalities. The TBM supports the following types naturally: Gaussian, intervals, censored, binary, categorical, multi-categorical, ordinal, (in)-complete rank with and without ties. We demonstrate the versatility and capacity of the proposed model on three applications of very different natures namely handwritten digit recognitions, collaborative filtering and complex survey analysis. }, OWNER = { dinh }, TIMESTAMP = { 2013.03.01 }, }

Factorial Multi-Task Learning : A Bayesian Nonparametric Approach
Gupta, S., Phung, D. and Venkatesh, S.. In Proceedings of International Conference on Machine Learning (ICML), Atlanta, USA, June 16-21 2013. [ | ]
Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods.

@INPROCEEDINGS { gupta_phung_venkatesh_icml13, TITLE = { Factorial Multi-Task Learning : A Bayesian Nonparametric Approach }, AUTHOR = { Gupta, S. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proceedings of International Conference on Machine Learning (ICML) }, YEAR = { 2013 }, ADDRESS = { Atlanta, USA }, MONTH = { June 16-21 }, ABSTRACT = { Multi-task learning is a paradigm shown to improve the performance of related tasks through their joint learning. However, for real-world data, it is usually difficult to assess the task relatedness and joint learning with unrelated tasks may lead to serious performance degradations. To this end, we propose a framework that groups the tasks based on their relatedness in a low dimensional subspace and allows a varying degree of relatedness among tasks by sharing the subspace bases across the groups. This provides the flexibility of no sharing when two sets of tasks are unrelated and partial/total sharing when the tasks are related. Importantly, the number of task-groups and the subspace dimensionality are automatically inferred from the data. This feature keeps the model beyond a specific set of parameters. To realize our framework, we present a novel Bayesian nonparametric prior that extends the traditional hierarchical beta process prior using a Dirichlet process to permit potentially infinite number of child beta processes. We apply our model for multi-task regression and classification applications. Experimental results using several synthetic and real-world datasets show the superiority of our model to other recent state-of-the-art multi-task learning methods. }, TIMESTAMP = { 2013.04.16 }, }

An Integrated Framework for Suicide Risk Prediction
Tran, T., Phung, D., Luo, W., Harvey,R., Berk,M. and Venkatesh, S.. In Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Chicago, US, 2013. [ | ]

@INPROCEEDINGS { truyen_phung_luo_harvey_berk_venkatesh_sigkdd13, TITLE = { An Integrated Framework for Suicide Risk Prediction }, AUTHOR = { Tran, T. and Phung, D. and Luo, W. and Harvey,R. and Berk,M. and Venkatesh, S. }, BOOKTITLE = { Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD) }, YEAR = { 2013 }, ADDRESS = { Chicago, US }, TIMESTAMP = { 2013.06.07 }, }

Sparse Subspace Clustering via Group Sparse Coding
Saha, B., Pham, D.S., Phung, D. and Venkatesh, S.. In Proceedings of the SIAM International Conference on Data Mining (SDM), pages 130-138, Texas, USA, May 2013. [ | ]
Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably.

@INPROCEEDINGS { saha_pham_phung_venkatesh_sdm13, TITLE = { Sparse Subspace Clustering via Group Sparse Coding }, AUTHOR = { Saha, B. and Pham, D.S. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proceedings of the SIAM International Conference on Data Mining (SDM) }, YEAR = { 2013 }, ADDRESS = { Texas, USA }, MONTH = { May }, PAGES = { 130-138 }, ABSTRACT = { Sparse subspace representation is an emerging and powerful approach for clustering of data, whose generative model is a union of subspaces. Existing sparse subspace representation methods are restricted to the single-task setting, which consequently leads to inefficient computation and sub-optimal performance. To address the current limitation, we propose a novel method that regularizes sparse subspace representation by exploiting the structural sharing between tasks and data points. The first regularizer aims at group level where we seek sparsity between groups but dense within group. The second regularizer models the interactions down to data point level via the well-known graph regularization technique. We also derive simple, provably convergent, and extremely computationally efficient algorithms for solving the proposed group formulations. We evaluate the proposed methods over a wide range of large-scale clustering problems: from challenging health care to image and text clustering benchmarks datasets and show that they outperform state-of-the-art considerably. }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

Mood sensing from social media texts and its applications
Nguyen, T., Phung, D., Adams, B. and Venkatesh, S.. Knowledge and Information Systems, 2013. [ | | pdf]
We present a large-scale mood analysis in social media texts. We organize the paper in three parts: 1) addressing the problem of feature selection and classification of mood in blogosphere, 2) we extract global mood patterns at different level of aggregation from a large-scale dataset of approximately 18 millions documents 3) and finally, we extract mood trajectory for an egocentric user and study how it can be used to detect subtle emotion signals in a user-centric manner, supporting discovery of hyper-groups of communities based on sentiment information. For mood classification, two feature sets proposed in psychology are used, showing that these features are efficient, do not require a training phase and yield classification results comparable to state-of-the-art, supervised feature-selection schemes; on mood patterns, empirical results for mood organisation in the blogosphere are provided, analogous to the structure of human emotion proposed independently in the psychology literature; and on community structure discovery, sentiment-based approach can yield useful insights into community formation.

@ARTICLE { nguyen_phung_adams_venkatesh_kais13, TITLE = { Mood sensing from social media texts and its applications }, AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Venkatesh, S. }, JOURNAL = { Knowledge and Information Systems }, YEAR = { 2013 }, PAGES = { 1-36 }, ABSTRACT = { We present a large-scale mood analysis in social media texts. We organize the paper in three parts: 1) addressing the problem of feature selection and classification of mood in blogosphere, 2) we extract global mood patterns at different level of aggregation from a large-scale dataset of approximately 18 millions documents 3) and finally, we extract mood trajectory for an egocentric user and study how it can be used to detect subtle emotion signals in a user-centric manner, supporting discovery of hyper-groups of communities based on sentiment information. For mood classification, two feature sets proposed in psychology are used, showing that these features are efficient, do not require a training phase and yield classification results comparable to state-of-the-art, supervised feature-selection schemes; on mood patterns, empirical results for mood organisation in the blogosphere are provided, analogous to the structure of human emotion proposed independently in the psychology literature; and on community structure discovery, sentiment-based approach can yield useful insights into community formation. }, CITESEERURL = { http://prada-research.net/~dinh/uploads/Main/Publications/Nguyen_etal_13mood.pdf }, DOI = { 10.1007/s10115-013-0628-8 }, ISSN = { 0219-1377 }, KEYWORDS = { Mood sensing; Mood classification; Mood pattern; Hyper-community }, LANGUAGE = { English }, OWNER = { thinng }, PUBLISHER = { Springer-Verlag }, TIMESTAMP = { 2013.04.03 }, URL = { http://link.springer.com/article/10.1007/s10115-013-0628-8/ }, }

TOBY: Early Intervention in Autism through Technology
Venkatesh, S., Phung, D., Greenhill, S., Duong, T. and Adams, B.. In Proceedings of the SIGCHI Conference on Human Factors in Computing System (CHI), pages 3187-3196, Paris, France, April 2013. [ | ]

@INPROCEEDINGS { venkatesh_phung_greenhill_duong_adams_chi13, TITLE = { {TOBY}: Early Intervention in Autism through Technology }, AUTHOR = { Venkatesh, S. and Phung, D. and Greenhill, S. and Duong, T. and Adams, B. }, BOOKTITLE = { Proceedings of the SIGCHI Conference on Human Factors in Computing System (CHI) }, YEAR = { 2013 }, ADDRESS = { Paris, France }, MONTH = { April }, PAGES = { 3187-3196 }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

Topic Model Kernel: An Empirical Study Towards Probabilistically Reduced Features For Classification
Nguyen, Tien V., Phung, D. and and Venkatesh, S.. Technical report, Pattern Recognition and Data Analytics, Deakin University, 2013. [ | | pdf]
Probabilistic topic models have become a standard in modern machine learning with wide applications in organizing and summarizing `documents' in high-dimensional data such as images, videos, texts, gene expression data, and so on. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics than bag-of-word interpretation, but also more informative for classification tasks. This paper describes the Topic Model Kernel (TMK), a high dimensional mapping for Support Vector Machine classification of data generated from probabilistic topic models. The applicability of our proposed kernel is demonstrated in several classification tasks from real world datasets. We outperform existing kernels on the distributional features and give the comparative results on non-probabilistic data types.

@TECHREPORT { nguyen_phung_venkatesh_tr13, TITLE = { Topic Model Kernel: An Empirical Study Towards Probabilistically Reduced Features For Classification }, AUTHOR = { Nguyen, Tien V. and Phung, D. and and Venkatesh, S. }, INSTITUTION = { Pattern Recognition and Data Analytics, Deakin University }, YEAR = { 2013 }, ABSTRACT = { Probabilistic topic models have become a standard in modern machine learning with wide applications in organizing and summarizing `documents' in high-dimensional data such as images, videos, texts, gene expression data, and so on. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics than bag-of-word interpretation, but also more informative for classification tasks. This paper describes the Topic Model Kernel (TMK), a high dimensional mapping for Support Vector Machine classification of data generated from probabilistic topic models. The applicability of our proposed kernel is demonstrated in several classification tasks from real world datasets. We outperform existing kernels on the distributional features and give the comparative results on non-probabilistic data types. }, OWNER = { nguyen }, TIMESTAMP = { 2013.07.01 }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/nguyen_etal_tr13.pdf }, }

Regularized nonnegative shared subspace learning
Gupta, S., Phung, D., Adams, B. and Venkatesh, S.. Data Mining and Knowledge Discovery, 26(1):57-97, January 2013. [ | ]
Joint modeling of related data sources has the potential to improve various data mining tasks such as transfer learning, multitask clustering, information retrieval etc. However, diversity among various data sources might outweigh the advantages of the joint modeling, and thus may result in performance degradations. To this end, we propose a regularized shared subspace learning framework, which can exploit the mutual strengths of related data sources while being immune to the effects of the variabilities of each source. This is achieved by further imposing a mutual orthogonality constraint on the constituent subspaces which segregates the common patterns from the source specific patterns, and thus, avoids performance degradations. Our approach is rooted in nonnegative matrix factorization and extends it further to enable joint analysis of related data sources. Experiments performed using three real world data sets for both retrieval and clustering applications demonstrate the benefits of regularization and validate the effectiveness of the model. Our proposed solution provides a formal framework appropriate for jointly analyzing related data sources and therefore, it is applicable to a wider context in data mining.

@ARTICLE { gupta_phung_adams_venkatesh_dami13, TITLE = { Regularized nonnegative shared subspace learning }, AUTHOR = { Gupta, S. and Phung, D. and Adams, B. and Venkatesh, S. }, JOURNAL = { Data Mining and Knowledge Discovery }, YEAR = { 2013 }, MONTH = { January }, NUMBER = { 1 }, PAGES = { 57-97 }, VOLUME = { 26 }, ABSTRACT = { Joint modeling of related data sources has the potential to improve various data mining tasks such as transfer learning, multitask clustering, information retrieval etc. However, diversity among various data sources might outweigh the advantages of the joint modeling, and thus may result in performance degradations. To this end, we propose a regularized shared subspace learning framework, which can exploit the mutual strengths of related data sources while being immune to the effects of the variabilities of each source. This is achieved by further imposing a mutual orthogonality constraint on the constituent subspaces which segregates the common patterns from the source specific patterns, and thus, avoids performance degradations. Our approach is rooted in nonnegative matrix factorization and extends it further to enable joint analysis of related data sources. Experiments performed using three real world data sets for both retrieval and clustering applications demonstrate the benefits of regularization and validate the effectiveness of the model. Our proposed solution provides a formal framework appropriate for jointly analyzing related data sources and therefore, it is applicable to a wider context in data mining. }, COMMENT = { coauthor }, OWNER = { 14232334 }, PUBLISHER = { Springer }, TIMESTAMP = { 2011.09.29 }, }

Split-Merge Augmented Gibbs Sampling for Hierarchical Dirichlet Processes
Rana, S., Phung, D. and Venkatesh, S.. In Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 546-557, Gold Coast, Queensland, Australia, April 2013. [ | ]
The Hierarchical Dirichlet Process (HDP) model is an important tool for topic analysis. Inference can be done through a Gibbs sampler using the auxiliary variable method. We propose a split-merge procedure to augment this method of inference, facilitating faster convergence. Whilst the incremental Gibbs sampler changes topic assignments of each word conditioned on the previous observations and model hyperparameters, the split-merge sampler changes the topic assignments over a group of words in a single move. This allows efficient exploration of state space. We evaluate the proposed sampler on a synthetic test set and two benchmark document corpus and show that the proposed sampler enables the MCMC chain to converge faster to the desired stationary distribution.

@INPROCEEDINGS { rana_phung_venkatesh_pakdd13, TITLE = { Split-Merge Augmented {G}ibbs Sampling for Hierarchical {D}irichlet Processes }, AUTHOR = { Rana, S. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2013 }, ADDRESS = { Gold Coast, Queensland, Australia }, MONTH = { April }, PAGES = { 546-557 }, ABSTRACT = { The Hierarchical Dirichlet Process (HDP) model is an important tool for topic analysis. Inference can be done through a Gibbs sampler using the auxiliary variable method. We propose a split-merge procedure to augment this method of inference, facilitating faster convergence. Whilst the incremental Gibbs sampler changes topic assignments of each word conditioned on the previous observations and model hyperparameters, the split-merge sampler changes the topic assignments over a group of words in a single move. This allows efficient exploration of state space. We evaluate the proposed sampler on a synthetic test set and two benchmark document corpus and show that the proposed sampler enables the MCMC chain to converge faster to the desired stationary distribution. }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

Latent Patient Profile Modelling and Applications with Mixed-Variate Restricted Boltzmann Machine
Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In Advances in Knowledge Discovery and Data Mining, pages 123-135, Gold Coast, Queensland, Australia, April 2013. [ | ]
Efficient management of chronic diseases is critical in modern health care. We consider diabetes mellitus, and our ongoing goal is to examine how machine learning can deliver information for clinical efficiency. The challenge is to aggregate highly heterogeneous sources including demographics, diagnoses, pathologies and treatments, and extract similar groups so that care plans can be designed. To this end, we propose a recent advance, mixed-variate restricted Boltzmann machine (MV.RBM) as it seamlessly integrates multiple data types for each patient aggregated over time and outputs a homogeneous representation called “latent profile” that can be used for patient clustering, visualisation, disease correlation analysis and prediction. We demonstrate that the method outperforms all baselines on these tasks - the primary characteristics of patients in the same groups are able to be identified and the good result can be achieved for the diagnosis codes prediction.

@INPROCEEDINGS { tu_truyen_phung_venkatesh_pakdd13, TITLE = { Latent Patient Profile Modelling and Applications with Mixed-Variate Restricted {B}oltzmann Machine }, AUTHOR = { Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { Advances in Knowledge Discovery and Data Mining }, YEAR = { 2013 }, ADDRESS = { Gold Coast, Queensland, Australia }, MONTH = { April }, NUMBER = { 978-3-642-37452-4 }, PAGES = { 123--135 }, PUBLISHER = { Springer-Verlag Berlin Heidelberg }, VOLUME = { 7818 }, ABSTRACT = { Efficient management of chronic diseases is critical in modern health care. We consider diabetes mellitus, and our ongoing goal is to examine how machine learning can deliver information for clinical efficiency. The challenge is to aggregate highly heterogeneous sources including demographics, diagnoses, pathologies and treatments, and extract similar groups so that care plans can be designed. To this end, we propose a recent advance, mixed-variate restricted Boltzmann machine (MV.RBM) as it seamlessly integrates multiple data types for each patient aggregated over time and outputs a homogeneous representation called “latent profile” that can be used for patient clustering, visualisation, disease correlation analysis and prediction. We demonstrate that the method outperforms all baselines on these tasks - the primary characteristics of patients in the same groups are able to be identified and the good result can be achieved for the diagnosis codes prediction. }, OWNER = { tund }, TIMESTAMP = { 2013.01.07 }, }

Clustering Patient Medical Records via Sparse Subspace Representation
Saha, B., Phung, D., Pham, D. and Venkatesh, S.. In Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pages 123-134, Gold Coast, Queensland, Australia, April 2013. [ | ]
The health industry is facing increasing challenge with "big data" as traditional methods fail to manage the scale and complexity. This paper examines clustering of patient records for chronic diseases to facilitate a better construction of care plans. We solve this problem under the framework of subspace clustering. Our novel contribution lies in the exploitation of sparse representation to discover subspaces automatically and a domain-specifc construction of weighting matrices for patient records. We show the new formulation is readily solved by extending existing `1-regularized optimization algorithms. Using a cohort of both diabetes and stroke data we show that we outperform existing benchmark clustering techniques in the literature.

@INPROCEEDINGS { saha_phung_pham_venkatesh_pakdd13, TITLE = { Clustering Patient Medical Records via Sparse Subspace Representation }, AUTHOR = { Saha, B. and Phung, D. and Pham, D. and Venkatesh, S. }, BOOKTITLE = { Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2013 }, ADDRESS = { Gold Coast, Queensland, Australia }, MONTH = { April }, PAGES = { 123-134 }, ABSTRACT = { The health industry is facing increasing challenge with "big data" as traditional methods fail to manage the scale and complexity. This paper examines clustering of patient records for chronic diseases to facilitate a better construction of care plans. We solve this problem under the framework of subspace clustering. Our novel contribution lies in the exploitation of sparse representation to discover subspaces automatically and a domain-specifc construction of weighting matrices for patient records. We show the new formulation is readily solved by extending existing `1-regularized optimization algorithms. Using a cohort of both diabetes and stroke data we show that we outperform existing benchmark clustering techniques in the literature. }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

Learning sparse latent representation and distance metric for image retrieval
Nguyen, Tu Dinh, Tran, Truyen, Phung, Dinh and Venkatesh, Svetha. In IEEE International Conference on Multimedia and Expo (ICME), pages 1-6, 2013. [ | ]
The performance of image retrieval depends critically on the semantic representation and the distance function used to estimate the similarity of two images. A good representation should integrate multiple visual and textual (e.g., tag) features and offer a step closer to the true semantics of interest (e.g., concepts). As the distance function operates on the representation, they are interdependent, and thus should be addressed at the same time. We propose a probabilistic solution to learn both the representation from multiple feature types and modalities and the distance metric from data. The learning is regularised so that the learned representation and information-theoretic metric will (i) preserve the regularities of the visual/textual spaces, (ii) enhance structured sparsity, (iii) encourage small intra-concept distances, and (iv) keep inter-concept images separated. We demonstrate the capacity of our method on the NUS-WIDE data. For the well-studied 13 animal subset, our method outperforms state-of-the-art rivals. On the subset of single-concept images, we gain 79:5% improvement over the standard nearest neighbours approach on the MAP score, and 45.7% on the NDCG.

@INPROCEEDINGS { tu_truyen_phung_venkatesh_icme13, TITLE = { Learning sparse latent representation and distance metric for image retrieval }, AUTHOR = { Nguyen, Tu Dinh and Tran, Truyen and Phung, Dinh and Venkatesh, Svetha }, BOOKTITLE = { IEEE International Conference on Multimedia and Expo (ICME) }, YEAR = { 2013 }, PAGES = { 1-6 }, ABSTRACT = { The performance of image retrieval depends critically on the semantic representation and the distance function used to estimate the similarity of two images. A good representation should integrate multiple visual and textual (e.g., tag) features and offer a step closer to the true semantics of interest (e.g., concepts). As the distance function operates on the representation, they are interdependent, and thus should be addressed at the same time. We propose a probabilistic solution to learn both the representation from multiple feature types and modalities and the distance metric from data. The learning is regularised so that the learned representation and information-theoretic metric will (i) preserve the regularities of the visual/textual spaces, (ii) enhance structured sparsity, (iii) encourage small intra-concept distances, and (iv) keep inter-concept images separated. We demonstrate the capacity of our method on the NUS-WIDE data. For the well-studied 13 animal subset, our method outperforms state-of-the-art rivals. On the subset of single-concept images, we gain 79:5% improvement over the standard nearest neighbours approach on the MAP score, and 45.7% on the NDCG. }, DOI = { 10.1109/ICME.2013.6607435 }, ISSN = { 1945-7871 }, KEYWORDS = { Abstracts;Australia;Rabbits;Whales;Image retrieval;Mixed-Variate;NUS-WIDE;Restricted Boltzmann Machines;metric learning;sparsity }, OWNER = { tund }, TIMESTAMP = { 2013.10.15 }, }

Exploiting Side Information in Distance Dependent Chinese Restaurant Processes for Data Clustering
Li, C., Phung, D., Rana, S. and Venkatesh, S.. In The 2013 IEEE International Conference on Multimedia and Expo (ICME 2013), San Jose, California, USA, July, 2013 2013. [ | | ]
Multimedia contents often possess weakly annotated data such as tags, links and interactions. The weakly annotated data is called side information. It is the auxiliary information of data and provides hints for exploring the link structure of data. Most clustering algorithms utilize pure data for clustering. A model that combines pure data and side information, such as images and tags, documents and keywords, can perform better at understanding the underlying structure of data. We demonstrate how to incorporate different types of side information into a recently proposed Bayesian non-parametric model, the distance dependent Chinese restaurant process (DD-CRP). Our algorithm embeds the affinity of this information into the decay function of the DD-CRP when side information is in the form of subsets of discrete labels. It is flexible to measure distance based on arbitrary side information instead of only the spatial layout or time stamp of observations. At the same time, for noisy and incomplete side information, we set the decay function so that the DD-CRP reduces to the traditional Chinese restaurant process, thus not inducing side effects of noisy and incomplete side information. Experimental evaluations on two real-world datasets NUS_WIDE and 20 Newsgroups show exploiting side information in DD-CRP significantly improves the clustering performance.

@INPROCEEDINGS { li_phung_rana_venkatesh_icme13, TITLE = { Exploiting Side Information in Distance Dependent Chinese Restaurant Processes for Data Clustering }, AUTHOR = { Li, C. and Phung, D. and Rana, S. and Venkatesh, S. }, BOOKTITLE = { The 2013 IEEE International Conference on Multimedia and Expo (ICME 2013) }, YEAR = { 2013 }, ADDRESS = { San Jose, California, USA }, MONTH = { July, 2013 }, ABSTRACT = { Multimedia contents often possess weakly annotated data such as tags, links and interactions. The weakly annotated data is called side information. It is the auxiliary information of data and provides hints for exploring the link structure of data. Most clustering algorithms utilize pure data for clustering. A model that combines pure data and side information, such as images and tags, documents and keywords, can perform better at understanding the underlying structure of data. We demonstrate how to incorporate different types of side information into a recently proposed Bayesian non-parametric model, the distance dependent Chinese restaurant process (DD-CRP). Our algorithm embeds the affinity of this information into the decay function of the DD-CRP when side information is in the form of subsets of discrete labels. It is flexible to measure distance based on arbitrary side information instead of only the spatial layout or time stamp of observations. At the same time, for noisy and incomplete side information, we set the decay function so that the DD-CRP reduces to the traditional Chinese restaurant process, thus not inducing side effects of noisy and incomplete side information. Experimental evaluations on two real-world datasets NUS_WIDE and 20 Newsgroups show exploiting side information in DD-CRP significantly improves the clustering performance. }, OWNER = { thinng }, TIMESTAMP = { 2013.04.12 }, URL = { 2013/coference/Cheng_etal_ICME2013.pdf }, }

Online Social Capital: Mood, Topical and Psycholinguistic Analysis
Nguyen, T., Dao, B., Phung, D., Venkatesh, S. and Berk, M.. In AAAI Int. Conf on Weblogs and Social Media (ICWSM), pages 449-456, Boston, USA, July 2013. [ | | pdf]
Social media provides rich sources of personal information and community interaction which can be linked to aspect of mental health. In this paper we investigate manifest properties of textual messages, including latent topics, psycholinguistic features, and authors' mood, of a large corpus of blog posts, to analyze the aspect of social capital in social media communities. Using data collected from Live Journal, we find that bloggers with lower social capital have fewer positive moods and more negative moods than those with higher social capital. It is also found that people with low social capital have more random mood swings over time than the people with high social capital. Significant differences are found between low and high social capital groups when characterized by a set of latent topics and psycholinguistic features derived from blogposts, suggesting discriminative features, proved to be useful for classification tasks. Good prediction is achieved when classifying among social capital groups using topic and linguistic features, with linguistic features are found to have greater predictive power than latent topics. The significance of our work lies in the importance of online social capital to potential construction of automatic healthcare monitoring systems. We further establish the link between mood and social capital in online communities, suggesting the foundation of new systems to monitor online mental well-being.

@INPROCEEDINGS { nguyen_dao_phung_venkatesh_berk_icwsm13, TITLE = { Online Social Capital: Mood, Topical and Psycholinguistic Analysis }, AUTHOR = { Nguyen, T. and Dao, B. and Phung, D. and Venkatesh, S. and Berk, M. }, BOOKTITLE = { AAAI Int. Conf on Weblogs and Social Media (ICWSM) }, YEAR = { 2013 }, ADDRESS = { Boston, USA }, MONTH = { July }, PAGES = { 449-456 }, ABSTRACT = { Social media provides rich sources of personal information and community interaction which can be linked to aspect of mental health. In this paper we investigate manifest properties of textual messages, including latent topics, psycholinguistic features, and authors' mood, of a large corpus of blog posts, to analyze the aspect of social capital in social media communities. Using data collected from Live Journal, we find that bloggers with lower social capital have fewer positive moods and more negative moods than those with higher social capital. It is also found that people with low social capital have more random mood swings over time than the people with high social capital. Significant differences are found between low and high social capital groups when characterized by a set of latent topics and psycholinguistic features derived from blogposts, suggesting discriminative features, proved to be useful for classification tasks. Good prediction is achieved when classifying among social capital groups using topic and linguistic features, with linguistic features are found to have greater predictive power than latent topics. The significance of our work lies in the importance of online social capital to potential construction of automatic healthcare monitoring systems. We further establish the link between mood and social capital in online communities, suggesting the foundation of new systems to monitor online mental well-being. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2011.06.03 }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/nguyen_dao_phung_venkatesh_berk_icwsm13.pdf }, }

Analysis of Psycholinguistic Processes and Topics in Online Autism Communities
Nguyen, T., Phung, D. and Venkatesh, S.. In The IEEE International Conference on Multimedia and Expo (ICME), San Jose, USA, July 2013. [ | | pdf]
Current growth of individuals on the autism spectrum disorder (ASD) requires continuous support and care. With the popularity of social media, online communities of people affected by ASD emerge. This paper presents an analysis of these online communities through understanding aspects that differentiate such communities. In this paper, the aspects given are not expressed in terms of friendship, exchange of information, social support or recreation, but rather with regard to the topics and linguistic styles that people express in their on-line writing. Using data collected unobtrusively from LiveJournal, we analyze posts made by ten autism communities in conjunction with those made by a control group of standard communities. Signi?cant differences have been found between autism and control communities when characterized by latent topics of discussion and psycholinguistic features. Latent topics are found to have greater predictive power than linguistic features when classifying blog posts as either autism or control community. This study suggests that data mining of online blogs has the potential to detect clinically meaningful data. It opens the door to possibilities including sentinel risk surveillance and harnessing the power in diverse large datasets.

@INPROCEEDINGS { nguyen_phung_venkatesh_icme13, TITLE = { Analysis of Psycholinguistic Processes and Topics in Online Autism Communities }, AUTHOR = { Nguyen, T. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { The IEEE International Conference on Multimedia and Expo (ICME) }, YEAR = { 2013 }, ADDRESS = { San Jose, USA }, MONTH = { July }, ABSTRACT = { Current growth of individuals on the autism spectrum disorder (ASD) requires continuous support and care. With the popularity of social media, online communities of people affected by ASD emerge. This paper presents an analysis of these online communities through understanding aspects that differentiate such communities. In this paper, the aspects given are not expressed in terms of friendship, exchange of information, social support or recreation, but rather with regard to the topics and linguistic styles that people express in their on-line writing. Using data collected unobtrusively from LiveJournal, we analyze posts made by ten autism communities in conjunction with those made by a control group of standard communities. Signi?cant differences have been found between autism and control communities when characterized by latent topics of discussion and psycholinguistic features. Latent topics are found to have greater predictive power than linguistic features when classifying blog posts as either autism or control community. This study suggests that data mining of online blogs has the potential to detect clinically meaningful data. It opens the door to possibilities including sentinel risk surveillance and harnessing the power in diverse large datasets. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2011.06.03 }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/nguyen_phung_venkatesh_icme13.pdf }, }

Toby Playpad: Empowering Parents to Provide Early Therapy in the Home
Venkatesh, S., Phung, D., Greenhill, S., Duong, T. and Adams, B.. In Proceedings of the International Meeting for Autism Research (IMFAR), page (accepted), Donostia, Spain, May 2013. [ | ]

@INPROCEEDINGS { venkatesh_phung_greenhill_duong_adams_imfar13, TITLE = { {Toby Playpad}: Empowering Parents to Provide Early Therapy in the Home }, AUTHOR = { Venkatesh, S. and Phung, D. and Greenhill, S. and Duong, T. and Adams, B. }, BOOKTITLE = { Proceedings of the International Meeting for Autism Research (IMFAR) }, YEAR = { 2013 }, ADDRESS = { Donostia, Spain }, MONTH = { May }, PAGES = { (accepted) }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

Toby Playpad: Empowering Parents to Provide Early Therapy in the Home (extended abstract)
Venkatesh, S., Duong, T., Phung, D., Greenhill, S., Adams, B., Marshall, W. and Cairns, D.. In Proc. of BioAutism Conference, Australian Society for Autism Research (ASFAR), Melbourne, Australia, February 2013. [ | ]

@INPROCEEDINGS { venkatesh_duong_phung_greenhill_adams_marshall_cairns_bioautism13, TITLE = { {Toby Playpad}: Empowering Parents to Provide Early Therapy in the Home (extended abstract) }, AUTHOR = { Venkatesh, S. and Duong, T. and Phung, D. and Greenhill, S. and Adams, B. and Marshall, W. and Cairns, D. }, BOOKTITLE = { Proc. of BioAutism Conference, Australian Society for Autism Research (ASFAR) }, YEAR = { 2013 }, ADDRESS = { Melbourne, Australia }, MONTH = { February }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

Interactive Browsing System for Anomaly Video Surveillance
Nguyen, T.V., Phung, D., Sunil, G. and Venkatesh, S.. In Proc. of IEEE Eight International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), pages 384 - 389, Melbourne, Australia, April 2013. [ | ]
Existing anomaly detection methods in video surveillance exhibit lack of congruence between rare events detected by algorithms and what is considered anomalous by users. This paper introduces a novel browsing model to address this issue, allowing users to interactively examine rare events in an intuitive manner. Introducing a novel way to compute rare motion patterns, we estimate latent factors of foreground motion patterns through Bayesian Nonparametric Factor analysis. Each factor corresponds to a typical motion pattern. A rarity score for each factor is computed, and ordered in decreasing order of rarity, permitting users to browse events using any proportion of rare factors. Rare events correspond to frames that contain the rare factors chosen. We present the user with an interface to inspect events that incorporate these rarest factors in a spatial-temporal manner. We demonstrate the system on a public video data set, showing key aspects of the browsing paradigm.

@INPROCEEDINGS { nguyen_phung_gupta_venkatesh_issnip13, TITLE = { Interactive Browsing System for Anomaly Video Surveillance }, AUTHOR = { Nguyen, T.V. and Phung, D. and Sunil, G. and Venkatesh, S. }, BOOKTITLE = { Proc. of IEEE Eight International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP) }, YEAR = { 2013 }, ADDRESS = { Melbourne, Australia }, MONTH = { April }, PAGES = { 384 - 389 }, ABSTRACT = { Existing anomaly detection methods in video surveillance exhibit lack of congruence between rare events detected by algorithms and what is considered anomalous by users. This paper introduces a novel browsing model to address this issue, allowing users to interactively examine rare events in an intuitive manner. Introducing a novel way to compute rare motion patterns, we estimate latent factors of foreground motion patterns through Bayesian Nonparametric Factor analysis. Each factor corresponds to a typical motion pattern. A rarity score for each factor is computed, and ordered in decreasing order of rarity, permitting users to browse events using any proportion of rare factors. Rare events correspond to frames that contain the rare factors chosen. We present the user with an interface to inspect events that incorporate these rarest factors in a spatial-temporal manner. We demonstrate the system on a public video data set, showing key aspects of the browsing paradigm. }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

TOBY playpad application to teach children with ASD: A pilot trial
Moore, D., Venkatesh, S., Anderson, A., Phung, D., Greenhill, S., Duong, T., Cairns, D., Marshall, W. and Whitehouse, A.. Developmental Neurorehabilitation, 2013. [ | ]
Purpose: To investigate use patterns and learning outcomes associated with the use of TOBY Playpad, an early intervention iPad application. Methods: Participants were 33 families with a child with an ASD aged 16 years or less, and with a diagnosis of Autism or Pervasive Developmental Disorder – Not Otherwise Specified, and no secondary diagnoses. Families were provided with TOBY and asked to use it for four to six weeks, without further prompting or coaching. Dependent variables included participant use patterns and initial indicators of child progress. Results: Twenty-three participants engaged extensively with TOBY, being exposed to at least 100 complete learn units (CLUs) and completing between 17% and 100% of the curriculum. Conclusions: TOBY may make a useful contribution to early intervention programming for children with ASD delivering high rates of appropriate learning opportunities. Further research evaluating the efficacy of TOBY in relation to independent indicators of functioning is warranted.

@ARTICLE { Moore_Venkatesh_Anderson_Greenhill_Phung_Duong_Cairns_Marshall_Whitehouse_DevNeu13, TITLE = { {TOBY} playpad application to teach children with ASD: A pilot trial }, AUTHOR = { Moore, D. and Venkatesh, S. and Anderson, A. and Phung, D. and Greenhill, S. and Duong, T. and Cairns, D. and Marshall, W. and Whitehouse, A. }, JOURNAL = { Developmental Neurorehabilitation }, YEAR = { 2013 }, PAGES = { 1-5 }, ABSTRACT = { Purpose: To investigate use patterns and learning outcomes associated with the use of TOBY Playpad, an early intervention iPad application. Methods: Participants were 33 families with a child with an ASD aged 16 years or less, and with a diagnosis of Autism or Pervasive Developmental Disorder – Not Otherwise Specified, and no secondary diagnoses. Families were provided with TOBY and asked to use it for four to six weeks, without further prompting or coaching. Dependent variables included participant use patterns and initial indicators of child progress. Results: Twenty-three participants engaged extensively with TOBY, being exposed to at least 100 complete learn units (CLUs) and completing between 17% and 100% of the curriculum. Conclusions: TOBY may make a useful contribution to early intervention programming for children with ASD delivering high rates of appropriate learning opportunities. Further research evaluating the efficacy of TOBY in relation to independent indicators of functioning is warranted. }, OWNER = { thinng }, TIMESTAMP = { 2013.04.15 }, }

TOBY: Therapy Outcome By You
Venkatesh, S., Greenhill, S., Phung, D., Duong, T., Adams, B., Marshall, W. and Cairns, D.. In Proceedings of the Annual Autism Conference, Portland, USA, January 2013. [ | ]
Early intervention is critical for children diagnosed with autism. Unfortunately, there is often a long gap of waiting, and wasting, time between a “formal” diagnosis and therapy. We describe TOBY Playpad (www.tobyplaypad.com) whose goal is to close this gap by empowering parents to help their children early. TOBY stands for Therapy Outcome by You and currently is an iPad application. It provides an adaptive syllabus of more than 320 activities developed by autism and machine learning experts to target key development areas which are known to be deficit for ASD children such as imitation, joint attention and language. TOBY delivers lessons, materials, instructions and interactions for both on-iPad and Natural Environment Tasks (NET) off-iPad activities. TOBY is highly adaptive and personalized, intelligently increasing its complexity, varying prompts and reinforcements as the child progresses over time. Prompting and reinforcing strategies are also recommended for parents to make the most of everyday opportunities to teach children. Essentially, TOBY removes the burden on parents from extensive preparation of materials and manual data recording. Three trials on 20, 50 and 36 children with AutismWest (www.autismwest.org.au) have been conducted since last year. The results are promising providing evidence of learning shown in skills that were not present previously in some children. NET activities are shown to be effective for children and popular with parents.

@INPROCEEDINGS { venkatesh_phung_greenhill_duong_adams_marshall_cairns_abai13, TITLE = { {TOBY}: Therapy Outcome By You }, AUTHOR = { Venkatesh, S. and Greenhill, S. and Phung, D. and Duong, T. and Adams, B. and Marshall, W. and Cairns, D. }, BOOKTITLE = { Proceedings of the Annual Autism Conference }, YEAR = { 2013 }, ADDRESS = { Portland, USA }, MONTH = { January }, ABSTRACT = { Early intervention is critical for children diagnosed with autism. Unfortunately, there is often a long gap of waiting, and wasting, time between a “formal” diagnosis and therapy. We describe TOBY Playpad (www.tobyplaypad.com) whose goal is to close this gap by empowering parents to help their children early. TOBY stands for Therapy Outcome by You and currently is an iPad application. It provides an adaptive syllabus of more than 320 activities developed by autism and machine learning experts to target key development areas which are known to be deficit for ASD children such as imitation, joint attention and language. TOBY delivers lessons, materials, instructions and interactions for both on-iPad and Natural Environment Tasks (NET) off-iPad activities. TOBY is highly adaptive and personalized, intelligently increasing its complexity, varying prompts and reinforcements as the child progresses over time. Prompting and reinforcing strategies are also recommended for parents to make the most of everyday opportunities to teach children. Essentially, TOBY removes the burden on parents from extensive preparation of materials and manual data recording. Three trials on 20, 50 and 36 children with AutismWest (www.autismwest.org.au) have been conducted since last year. The results are promising providing evidence of learning shown in skills that were not present previously in some children. NET activities are shown to be effective for children and popular with parents. }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

2012

Conditionally Dependent Dirichlet Processes for Modelling Naturally Correlated Data Sources
Phung, D., Nguyen, X., Bui, H., Nguyen, T.V. and Venkatesh, S.. Technical report, Pattern Recognition and Data Analytics, Deakin University, 2012. [ | | pdf]
Abstract We introduce a new class of conditionally dependent Dirichlet processes (CDP) for hierarchical mixture modelling of naturally correlated data sources. This class of models provides a Bayesian nonparametric approach for modelling a range of challenging datasets which typically consists of heterogeneous observations from multiple correlated data channels. Some typical examples include annotated social media, networks in community where information about friendship and relation coexist with user's profiles, medical records where patient's information exists in several dimension (demograhic information, medical history, drug uses and so on). The proposed framework can easily be tailored to model multiple data sources which are correlated by some latent underlying processes, whereas most of existing topic models, notably hierarchical Dirichlet processes (HDP), is designed for only a single data observation channel. In these existing approaches, data are grouped into documents (e.g., text documents or they are grouped according to some covariates such as time or location). Our approach is different: we view context as distributions over some index space and model both topics and contexts jointly. Distributions over topic parameters are modelled according to the usual Dirichlet processes. Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing machemism to induce conditional random mixture distributions on the context observation spaces -- loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice. Our model can be viewed as an integration of the hierarchical Dirichlet process (HDP) and the recent nested Dirichlet process (nDP) with shared mixture components. In fact, it provides an interesting interpretation whereas, under a suitable parameterization, integrating out the topic components results in a nested DP, whereas interating out the context compoenents results in a hierarchical DP. Different approaches for posterior inference exist. This paper focus on the development of an auxilliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets for temporal topic modelling and trajectory discovery in videos surveillances. We then demonstrate an application on a current visual category classification challenge in computer vision for which we significantly outperform the current reported state-of-the-art results. Finally, it is worthwide to note that our proposed approach can be easily twisted to accomodate different forms of supervision (weakly annotated data, semi-supervision) and to perform prediction.

@TECHREPORT { phung_nguyen_bui_nguyen_venkatesh_tr12, TITLE = { Conditionally Dependent {D}irichlet Processes for Modelling Naturally Correlated Data Sources }, AUTHOR = { Phung, D. and Nguyen, X. and Bui, H. and Nguyen, T.V. and Venkatesh, S. }, INSTITUTION = { Pattern Recognition and Data Analytics, Deakin University }, YEAR = { 2012 }, ABSTRACT = { Abstract We introduce a new class of conditionally dependent Dirichlet processes (CDP) for hierarchical mixture modelling of naturally correlated data sources. This class of models provides a Bayesian nonparametric approach for modelling a range of challenging datasets which typically consists of heterogeneous observations from multiple correlated data channels. Some typical examples include annotated social media, networks in community where information about friendship and relation coexist with user's profiles, medical records where patient's information exists in several dimension (demograhic information, medical history, drug uses and so on). The proposed framework can easily be tailored to model multiple data sources which are correlated by some latent underlying processes, whereas most of existing topic models, notably hierarchical Dirichlet processes (HDP), is designed for only a single data observation channel. In these existing approaches, data are grouped into documents (e.g., text documents or they are grouped according to some covariates such as time or location). Our approach is different: we view context as distributions over some index space and model both topics and contexts jointly. Distributions over topic parameters are modelled according to the usual Dirichlet processes. Stick-breaking representation gives rise to explicit realizations of topic atoms which we use as an indexing machemism to induce conditional random mixture distributions on the context observation spaces -- loosely speaking, we use a stochastic process, being DP, to conditionally `index' other stochastic processes. The later can be designed on any suitable family of stochastic processes to suit modelling needs or data types of contexts (such as Beta or Gaussian processes). Dirichlet process is of course an obvious choice. Our model can be viewed as an integration of the hierarchical Dirichlet process (HDP) and the recent nested Dirichlet process (nDP) with shared mixture components. In fact, it provides an interesting interpretation whereas, under a suitable parameterization, integrating out the topic components results in a nested DP, whereas interating out the context compoenents results in a hierarchical DP. Different approaches for posterior inference exist. This paper focus on the development of an auxilliary conditional Gibbs sampling in which both topic and context atoms are marginalized out. We demonstrate the framework on synthesis datasets for temporal topic modelling and trajectory discovery in videos surveillances. We then demonstrate an application on a current visual category classification challenge in computer vision for which we significantly outperform the current reported state-of-the-art results. Finally, it is worthwide to note that our proposed approach can be easily twisted to accomodate different forms of supervision (weakly annotated data, semi-supervision) and to perform prediction. }, OWNER = { phung }, TIMESTAMP = { 2012.10.31 }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/phung_etal_tr12.pdf }, }

A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning
Gupta, S., Phung, D. and Venkatesh, S.. In International Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, USA, August 2012. [ | ]

@INPROCEEDINGS { gupta_phung_venkatesh_uai12, TITLE = { A Slice Sampler for Restricted Hierarchical {B}eta Process with Applications to Shared Subspace Learning }, AUTHOR = { Gupta, S. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { International Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2012 }, ADDRESS = { Catalina Island, USA }, MONTH = { August }, OWNER = { dinh }, TIMESTAMP = { 2012.05.24 }, }

A Bayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources
Gupta, S., Phung, D. and Venkatesh, S.. In Proc. of SIAM Int. Conference on Data Mining (SDM), Anaheim, California, USA, April 2012. [ | ]
Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval.

@INPROCEEDINGS { gupta_phung_venkatesh_sdm12, TITLE = { A {B}ayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources }, AUTHOR = { Gupta, S. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proc. of SIAM Int. Conference on Data Mining (SDM) }, YEAR = { 2012 }, ADDRESS = { Anaheim, California, USA }, MONTH = { April }, ABSTRACT = { Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems -- transfer learning in text and image retrieval. }, }

A Sequential Decision Approach to Ordinal Preferences in Recommender Systems
Truyen, T., Phung, D. and Venkatesh, S.. In Proceedings of AAAI Conf. on Artificial Intelligence (AAAI), Toronto, Canada, July 2012. [ | ]
We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods.

@INPROCEEDINGS { truyen_phung_venkatesh_aaai12, TITLE = { A Sequential Decision Approach to Ordinal Preferences in Recommender Systems }, AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proceedings of AAAI Conf. on Artificial Intelligence (AAAI) }, YEAR = { 2012 }, ADDRESS = { Toronto, Canada }, MONTH = { July }, ABSTRACT = { We propose a novel sequential decision approach to modeling ordinal ratings in collaborative filtering problems. The rating process is assumed to start from the lowest level, evaluates against the latent utility at the corresponding level and moves up until a suitable ordinal level is found. Crucial to this generative process is the underlying utility random variables that govern the generation of ratings and their modelling choices. To this end, we make a novel use of the generalised extreme value distributions, which is found to be particularly suitable for our modeling tasks and at the same time, facilitate our inference and learning procedure. The proposed approach is flexible to incorporate features from both the user and the item. We evaluate the proposed framework on three well-known datasets: MovieLens, Dating Agency and Netflix. In all cases, it is demonstrated that the proposed work is competitive against state-of-the-art collaborative filtering methods. }, TIMESTAMP = { 2012.04.11 }, }

Improved Subspace Clustering via Exploitation of Spatial Constraints
Pham, S., Budhaditya, S., Phung, D. and Venkatesh, S.. In Proceedings of IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Rhode Island, USA, June 2012. [ | ]
We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation.

@INPROCEEDINGS { pham_budhaditya_phung_venkatesh_cvpr12, TITLE = { Improved Subspace Clustering via Exploitation of Spatial Constraints }, AUTHOR = { Pham, S. and Budhaditya, S. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proceedings of IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2012 }, ADDRESS = { Rhode Island, USA }, MONTH = { June }, ABSTRACT = { We present a novel approach to improving subspace clustering by exploiting the spatial constraints. The new method encourages the sparse solution to be consistent with the spatial geometry of the tracked points, by embedding weights into the sparse formulation. By doing so, we are able to correct sparse representations in a principled manner without introducing much additional computational cost. We discuss alternative ways to treat the missing and corrupted data using the latest theory in robust lasso regression and suggest numerical algorithms so solve the proposed formulation. The experiments on the benchmark Johns Hopkins 155 dataset demonstrate that exploiting spatial constraints significantly improves motion segmentation. }, OWNER = { thinng }, TIMESTAMP = { 2012.04.11 }, }

Sparse Subspace Representation for Spectral Document Clustering
Saha, B., Phung, D., Pham, D.S. and Venkatesh, S.. In IEEE International Conference on Data Mining (ICDM), Brussels, Belgium, December 2012. [ | ]

@INPROCEEDINGS { saha_phung_pham_venkatesh_icdm12, TITLE = { Sparse Subspace Representation for Spectral Document Clustering }, AUTHOR = { Saha, B. and Phung, D. and Pham, D.S. and Venkatesh, S. }, BOOKTITLE = { IEEE International Conference on Data Mining (ICDM) }, YEAR = { 2012 }, ADDRESS = { Brussels, Belgium }, MONTH = { December }, OWNER = { dinh }, TIMESTAMP = { 2012.10.31 }, }

A Sentiment-Aware Approach to Community Formation in Social Media
Nguyen, T., Phung, D., Adams, B. and Venkatesh, S.. In AAAI Int. Conf on Weblogs and Social Media (ICWSM), Dublin, Ireland, June 2012. [ | ]

@INPROCEEDINGS { nguyen_phung_adams_venkatesh_icwsm12, TITLE = { A Sentiment-Aware Approach to Community Formation in Social Media }, AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Venkatesh, S. }, BOOKTITLE = { AAAI Int. Conf on Weblogs and Social Media (ICWSM) }, YEAR = { 2012 }, ADDRESS = { Dublin, Ireland }, MONTH = { June }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2011.06.03 }, }

Cumulative restricted Boltzmann machines for ordinal matrix data analysis
Truyen, T., Phung, D. and Venkatesh, S.. In Proceedings of Asian Conference on Machine Learning (ACML), Singapore, November 2012. [ | ]
Ordinal data is omnipresent in almost all multiuser-generated feedback - questionnaires, preferences etc. This paper investigates modelling of ordinal data with Gaussian restricted Boltzmann machines (RBMs). In particular, we present the model architecture, learning and inference procedures for both vector-variate and matrix-variate ordinal data. We show that our model is able to capture latent opinion prole of citizens around the world, and is competitive against state-of-art collaborative ltering techniques on large-scale public datasets. The model thus has the potential to extend application of RBMs to diverse domains such as recommendation systems, product reviews and expert assessments.

@INPROCEEDINGS { truyen_phung_venkatesh_acml12a, TITLE = { Cumulative restricted {B}oltzmann machines for ordinal matrix data analysis }, AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proceedings of Asian Conference on Machine Learning (ACML) }, YEAR = { 2012 }, ADDRESS = { Singapore }, MONTH = { November }, ABSTRACT = { Ordinal data is omnipresent in almost all multiuser-generated feedback - questionnaires, preferences etc. This paper investigates modelling of ordinal data with Gaussian restricted Boltzmann machines (RBMs). In particular, we present the model architecture, learning and inference procedures for both vector-variate and matrix-variate ordinal data. We show that our model is able to capture latent opinion prole of citizens around the world, and is competitive against state-of-art collaborative ltering techniques on large-scale public datasets. The model thus has the potential to extend application of RBMs to diverse domains such as recommendation systems, product reviews and expert assessments. }, }

Learning From Ordered Sets and Applications in Collaborative Ranking
Truyen, T., Phung, D and Venkatesh, S.. In Proceedings of Asian Conference on Machine Learning (ACML), Singapore, November 2012. [ | ]
Ranking over sets arise when users choose between groups of items. For example, a group may be of those movies deemed 5 stars to them, or a customized tour package. It turns out, to model this data type properly, we need to investigate the general combinatorics problem of partitioning a set and ordering the subsets. Here we construct a probabilistic log-linear model over a set of ordered subsets. Inference in this combinatorial space is highly challenging: The space size approaches (N!=2)6:93145N+1 as N approaches innity. We propose a split-and-merge Metropolis-Hastings procedure that can explore the statespace eciently. For discovering hidden aspects in the data, we enrich the model with latent binary variables so that the posteriors can be eciently evaluated. Finally, we evaluate the proposed model on large-scale collaborative ltering tasks and demonstrate that it is competitive against state-of-the-art methods.

@INPROCEEDINGS { truyen_phung_venkatesh_acml12b, TITLE = { Learning From Ordered Sets and Applications in Collaborative Ranking }, AUTHOR = { Truyen, T. and Phung, D and Venkatesh, S. }, BOOKTITLE = { Proceedings of Asian Conference on Machine Learning (ACML) }, YEAR = { 2012 }, ADDRESS = { Singapore }, MONTH = { November }, ABSTRACT = { Ranking over sets arise when users choose between groups of items. For example, a group may be of those movies deemed 5 stars to them, or a customized tour package. It turns out, to model this data type properly, we need to investigate the general combinatorics problem of partitioning a set and ordering the subsets. Here we construct a probabilistic log-linear model over a set of ordered subsets. Inference in this combinatorial space is highly challenging: The space size approaches (N!=2)6:93145N+1 as N approaches innity. We propose a split-and-merge Metropolis-Hastings procedure that can explore the statespace eciently. For discovering hidden aspects in the data, we enrich the model with latent binary variables so that the posteriors can be eciently evaluated. Finally, we evaluate the proposed model on large-scale collaborative ltering tasks and demonstrate that it is competitive against state-of-the-art methods. }, OWNER = { thinng }, TIMESTAMP = { 2013.04.12 }, }

Emotional Reactions to Real-World Events in Social Networks
Nguyen, T., Phung, D., Adams, B. and Venkatesh, S.. In New Frontiers in Applied Data Mining, pages 53-64.Springer, , 2012. [ | | pdf]
A convergence of emotions among people in social networks is potentially resulted by the occurrence of an unprecedented event in real world. E.g., a majority of bloggers would react angrily at the September 11 terrorist attacks. Based on this observation, we introduce a sentiment index, computed from the current mood tags in a collection of blog posts utilizing an affective lexicon, potentially revealing subtle events discussed in the blogosphere. We then develop a method for extracting events based on this index and its distribution. Our second contribution is establishment of a new bursty structure in text streams termed a sentiment burst. We employ a stochastic model to detect bursty periods of moods and the events associated. Our results on a dataset of more than 12 million mood-tagged blog posts over a 4-year period have shown that our sentiment-based bursty events are indeed meaningful, in several ways.

@INCOLLECTION { nguyen_phung_adams_venkatesh_lncs12, TITLE = { Emotional Reactions to Real-World Events in Social Networks }, AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Venkatesh, S. }, BOOKTITLE = { New Frontiers in Applied Data Mining }, PUBLISHER = { Springer }, YEAR = { 2012 }, EDITOR = { Cao, Longbing and Huang, Joshua and Bailey, James and Koh, Yun and Luo, Jun }, PAGES = { 53--64 }, ABSTRACT = { A convergence of emotions among people in social networks is potentially resulted by the occurrence of an unprecedented event in real world. E.g., a majority of bloggers would react angrily at the September 11 terrorist attacks. Based on this observation, we introduce a sentiment index, computed from the current mood tags in a collection of blog posts utilizing an affective lexicon, potentially revealing subtle events discussed in the blogosphere. We then develop a method for extracting events based on this index and its distribution. Our second contribution is establishment of a new bursty structure in text streams termed a sentiment burst. We employ a stochastic model to detect bursty periods of moods and the events associated. Our results on a dataset of more than 12 million mood-tagged blog posts over a 4-year period have shown that our sentiment-based bursty events are indeed meaningful, in several ways. }, OWNER = { dinh }, TIMESTAMP = { 2012.04.05 }, URL = { http://dx.doi.org/10.1007/978-3-642-28320-8_5 }, }

Pervasive multimedia for autism intervention
Venkatesh, S., Greenhill, S., Phung, D., Adams, B. and Duong, T.. Pervasive and Mobile Computing (PMC), 8(6):863 - 882, 2012. [ | ]
There is a growing gap between the number of children with autism requiring early intervention and available therapy. We present a portable platform for pervasive delivery of early intervention therapy using multi-touch interfaces and principled ways to deliver stimuli of increasing complexity and adapt to a child's performance. Our implementation weaves Natural Environment Tasks with iPad tasks, facilitating a learning platform that integrates early intervention in the child’s daily life. The system's construction of stimulus complexity relative to task is evaluated by therapists, together with field trials for evaluating both the integrity of the instructional design and goal of stimulus presentation and adjustment relative to performance for learning tasks. We show positive results across all our stakeholders–children, parents and therapists. Our results have implications for other early learning fields that require principled ways to construct lessons across skills and adjust stimuli relative to performance.

@ARTICLE { venkatesh_greenhill_phung_adams_duong_pmc12, TITLE = { Pervasive multimedia for autism intervention }, AUTHOR = { Venkatesh, S. and Greenhill, S. and Phung, D. and Adams, B. and Duong, T. }, JOURNAL = { Pervasive and Mobile Computing (PMC) }, YEAR = { 2012 }, NUMBER = { 6 }, PAGES = { 863 - 882 }, VOLUME = { 8 }, ABSTRACT = { There is a growing gap between the number of children with autism requiring early intervention and available therapy. We present a portable platform for pervasive delivery of early intervention therapy using multi-touch interfaces and principled ways to deliver stimuli of increasing complexity and adapt to a child's performance. Our implementation weaves Natural Environment Tasks with iPad tasks, facilitating a learning platform that integrates early intervention in the child’s daily life. The system's construction of stimulus complexity relative to task is evaluated by therapists, together with field trials for evaluating both the integrity of the instructional design and goal of stimulus presentation and adjustment relative to performance for learning tasks. We show positive results across all our stakeholders–children, parents and therapists. Our results have implications for other early learning fields that require principled ways to construct lessons across skills and adjust stimuli relative to performance. }, OWNER = { dinh }, PUBLISHER = { Elsevier }, TIMESTAMP = { 2012.08.02 }, }

Social Reader: Towards browsing the Social Web
Adams, B., Phung, D. and Venkatesh, S.. Multimedia Tools and Applications, June 2012. [ | | pdf]
We describe Social Reader, a feed-reader-plus-social-network aggregator that mines comments from social media in order to display a user’s relational neighborhood as a navigable social network. Social Reader’s network visualization enhances mutual awareness of blogger communities, facilitates their exploration and growth with a fully dragn- drop interface, and provides novel ways to filter and summarize people, groups, blogs and comments. We discuss the architecture behind the reader, highlight tasks it adds to the workflow of a typical reader, and assess their cost. We also explore the potential of mood-based features in social media applications. Mood is particularly relevant to social media, reflecting the personal nature of the medium. We explore two prototype mood-based features: colour coding the mood of recent posts according to a valence/arousal map, and a mood-based abstract of recent activity using image media. A six week study of the software involving 20 users confirmed the usefulness of the novel visual display, via a quantitative analysis of use logs, and an exit survey.

@ARTICLE { adams_phung_venkatesh_mtap12, TITLE = { Social Reader: Towards browsing the Social Web }, AUTHOR = { Adams, B. and Phung, D. and Venkatesh, S. }, JOURNAL = { Multimedia Tools and Applications }, YEAR = { 2012 }, MONTH = { June }, PAGES = { 1-40 }, ABSTRACT = { We describe Social Reader, a feed-reader-plus-social-network aggregator that mines comments from social media in order to display a user’s relational neighborhood as a navigable social network. Social Reader’s network visualization enhances mutual awareness of blogger communities, facilitates their exploration and growth with a fully dragn- drop interface, and provides novel ways to filter and summarize people, groups, blogs and comments. We discuss the architecture behind the reader, highlight tasks it adds to the workflow of a typical reader, and assess their cost. We also explore the potential of mood-based features in social media applications. Mood is particularly relevant to social media, reflecting the personal nature of the medium. We explore two prototype mood-based features: colour coding the mood of recent posts according to a valence/arousal map, and a mood-based abstract of recent activity using image media. A six week study of the software involving 20 users confirmed the usefulness of the novel visual display, via a quantitative analysis of use logs, and an exit survey. }, DOI = { DOI 10.1007/s11042-012-1138-5 }, FILE = { :papers\\phung\\adams_phung_venkatesh_mtap12.pdf:PDF }, OWNER = { dinh }, TIMESTAMP = { 2012.05.24 }, URL = { http://www.springerlink.com/content/3k230432w50443l0/fulltext.pdf }, }

Event Extraction Using Behaviors of Sentiment Signals and Burst Structure in Social Media
Nguyen, T., Phung, D., Adams, B. and Venkatesh, S.. Knowledge and Information Systems, October 2012. [ | ]
Significant world events often cause the behavioral convergence of the expression of shared sentiment. This paper examines the use of the blogosphere as a framework to study user psychological behaviours, using their sentiment responses as a form of ‘sensor’ to infer real-world events of importance automatically. We formulate a novel temporal sentiment index function using quantitative measure of the valence value of bearing words in blog posts in which the set of affective bearing words is inspired from psychological research in emotion structure. The annual local minimum and maximum of the proposed sentiment signal function are utilized to extract significant events of the year and corresponding blog posts are further analyzed using topic modelling tools to understand their content. The paper then examines the correlation of topics discovered in relation to world news events reported by the mainstream news service provider, Cable News Network (CNN), and by using the Google search engine. Next, aiming at understanding sentiment at a finer granularity over time, we propose a stochastic burst detection model, extended from the work of Kleinberg, to work incrementally with stream data. The proposed model is then used to extract sentimental bursts occurring within a specific mood label (for example, a burst of observing ‘shocked’). The blog posts at those time indices are analyzed to extract topics and these are compared to real-world news events. Our comprehensive set of experiments conducted on a large-scale set of 12 million posts from Livejournal shows that the proposed sentiment index (SI) function coincides well with significant world events while bursts in sentiment allow us to locate finer-grain external world events.

@ARTICLE { nguyen_phung_adams_venkatesh_kais12, TITLE = { Event Extraction Using Behaviors of Sentiment Signals and Burst Structure in Social Media }, AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Venkatesh, S. }, JOURNAL = { Knowledge and Information Systems }, YEAR = { 2012 }, MONTH = { October }, PAGES = { 1-26 }, ABSTRACT = { Significant world events often cause the behavioral convergence of the expression of shared sentiment. This paper examines the use of the blogosphere as a framework to study user psychological behaviours, using their sentiment responses as a form of ‘sensor’ to infer real-world events of importance automatically. We formulate a novel temporal sentiment index function using quantitative measure of the valence value of bearing words in blog posts in which the set of affective bearing words is inspired from psychological research in emotion structure. The annual local minimum and maximum of the proposed sentiment signal function are utilized to extract significant events of the year and corresponding blog posts are further analyzed using topic modelling tools to understand their content. The paper then examines the correlation of topics discovered in relation to world news events reported by the mainstream news service provider, Cable News Network (CNN), and by using the Google search engine. Next, aiming at understanding sentiment at a finer granularity over time, we propose a stochastic burst detection model, extended from the work of Kleinberg, to work incrementally with stream data. The proposed model is then used to extract sentimental bursts occurring within a specific mood label (for example, a burst of observing ‘shocked’). The blog posts at those time indices are analyzed to extract topics and these are compared to real-world news events. Our comprehensive set of experiments conducted on a large-scale set of 12 million posts from Livejournal shows that the proposed sentiment index (SI) function coincides well with significant world events while bursts in sentiment allow us to locate finer-grain external world events. }, }

Detection of Cross-Channel Anomalies
Pham, S., Budhaditya, S., Phung, D. and Venkatesh, S.. Knowledge and Information Systems (KAIS), June 2012. [ | ]
The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.

@ARTICLE { pham_budhaditya_phung_venkatesh_kais12, TITLE = { Detection of Cross-Channel Anomalies }, AUTHOR = { Pham, S. and Budhaditya, S. and Phung, D. and Venkatesh, S. }, JOURNAL = { Knowledge and Information Systems (KAIS) }, YEAR = { 2012 }, MONTH = { June }, PAGES = { 1-27 }, ABSTRACT = { The data deluge has created a great challenge for data mining applications wherein the rare topics of interest are often buried in the flood of major headlines. We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross-channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Central to this new problem is a development of theoretical foundation and methodology. Using the spectral approach, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. We also derive the extension of the proposed detection method to an online settings, which automatically adapts to changes in the data over time at low computational complexity using incremental algorithms. Our mathematical analysis shows that our method is likely to reduce the false alarm rate by establishing theoretical results on the reduction of an impurity index. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in large-scale video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. }, }

A Nonparametric Bayesian Poisson Gamma Model for Count Data
Gupta, S., Phung, D. and Venkatesh, S.. In Proceedings of International Conference on Pattern Recognition (ICPR), pages 1815-1818, 2012. [ | ]
We propose a nonparametric Bayesian, linear Poisson gamma model for modeling count data and use it for dictionary learning. A key property of this model is that it captures the parts-based representation similar to nonnegative matrix factorization. We present an auxiliary variable Gibbs sampler, which turns the intractable inference into a tractable one. Combining this inference procedure with the slice sampler, we show that our model can learn the number of factors automatically from the data. The proposed model has been demonstrated using both synthetic and real-world datasets for dictionary learning applications.

@INPROCEEDINGS { gupta_phung_venkatesh_icpr12, TITLE = { A Nonparametric {B}ayesian {P}oisson {G}amma Model for Count Data }, AUTHOR = { Gupta, S. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proceedings of International Conference on Pattern Recognition (ICPR) }, YEAR = { 2012 }, PAGES = { 1815-1818 }, ABSTRACT = { We propose a nonparametric Bayesian, linear Poisson gamma model for modeling count data and use it for dictionary learning. A key property of this model is that it captures the parts-based representation similar to nonnegative matrix factorization. We present an auxiliary variable Gibbs sampler, which turns the intractable inference into a tractable one. Combining this inference procedure with the slice sampler, we show that our model can learn the number of factors automatically from the data. The proposed model has been demonstrated using both synthetic and real-world datasets for dictionary learning applications. }, OWNER = { dinh }, TIMESTAMP = { 2012.06.26 }, }

Multi-modal Abnormality Detection in Video with Unknown Data Segmentation
Nguyen, Tien Vu, Phung, Dinh, Rana, Santu, Pham, Duc Son and Venkatesh, Svetha. In Pattern Recognition (ICPR), 2012 21st International Conference on, pages 1322-1325, Tsukuba, Japan. IEEE, November 2012. [ | ]
This paper examines a new problem in large scale stream data: abnormality detection which is localised to a data segmentation process. Unlike traditional abnormality detection methods which typically build one unified model across data stream, we propose that building multiple detection models focused on different coherent sections of the video stream would result in better detection performance. One key challenge is to segment the data into coherent sections as the number of segments is not known in advance and can vary greatly across cameras; and a principled way approach is required. To this end, we first employ the recently proposed infinite HMM and collapsed Gibbs inference to automatically infer data segmentation followed by constructing abnormality detection models which are localised to each segmentation.We demonstrate the superior performance of the proposed framework in a realworld surveillance camera data over 14 days.

@INPROCEEDINGS { nguyen_phung_rana_pham_venkatesh_icpr12, TITLE = { Multi-modal Abnormality Detection in Video with Unknown Data Segmentation }, AUTHOR = { Nguyen, Tien Vu and Phung, Dinh and Rana, Santu and Pham, Duc Son and Venkatesh, Svetha }, BOOKTITLE = { Pattern Recognition (ICPR), 2012 21st International Conference on }, YEAR = { 2012 }, ADDRESS = { Tsukuba, Japan }, MONTH = { November }, ORGANIZATION = { IEEE }, PAGES = { 1322--1325 }, ABSTRACT = { This paper examines a new problem in large scale stream data: abnormality detection which is localised to a data segmentation process. Unlike traditional abnormality detection methods which typically build one unified model across data stream, we propose that building multiple detection models focused on different coherent sections of the video stream would result in better detection performance. One key challenge is to segment the data into coherent sections as the number of segments is not known in advance and can vary greatly across cameras; and a principled way approach is required. To this end, we first employ the recently proposed infinite HMM and collapsed Gibbs inference to automatically infer data segmentation followed by constructing abnormality detection models which are localised to each segmentation.We demonstrate the superior performance of the proposed framework in a realworld surveillance camera data over 14 days. }, OWNER = { dinh }, TIMESTAMP = { 2012.06.26 }, }

Embedded Restricted Boltzmann Machines for Fusion of Mixed Data
Truyen, T., Phung, D. and Venkatesh, S.. In Proc. of IEEE Int. Conf. on Fusion (FUSION), Singapore, July 2012. [ | ]

@INPROCEEDINGS { truyen_phung_venkatesh_fusion12, TITLE = { Embedded Restricted Boltzmann Machines for Fusion of Mixed Data }, AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proc. of IEEE Int. Conf. on Fusion (FUSION) }, YEAR = { 2012 }, ADDRESS = { Singapore }, MONTH = { July }, OWNER = { dinh }, TIMESTAMP = { 2012.05.24 }, }

Learning Boltzmann distance metric for face recognition
Truyen, T., Phung, D. and Venkatesh, S.. In Proc. of IEEE International Conference on Multimedia and Expo (ICME), Melbourne, Australia, July 2012. [ | ]

@INPROCEEDINGS { truyen_phung_venkatesh_icme12, TITLE = { Learning Boltzmann distance metric for face recognition }, AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proc. of IEEE International Conference on Multimedia and Expo (ICME) }, YEAR = { 2012 }, ADDRESS = { Melbourne, Australia }, MONTH = { July }, OWNER = { thinng }, TIMESTAMP = { 2012.04.11 }, }

Funniest Thing I've Seen Since: Shifting Perspectives from Multimedia Artefacts to Utterances
Adams, B., Phung, D. and Venkatesh, S.. In Proceedings of ACM Workshop on Socially-Aware Multimedia, in conjunction with ACM Int. Conf on Multimedia (ACM-MM), Nara, Japan, October 2012. [ | ]
Suicide is a major concern in society. Despite of great attention paid by the community with very substantive medico-legal implications, there has been no satisfying method that can reliably predict the future attempted or completed suicide. We present an integrated machine learning framework to tackle this challenge. Our proposed framework consists of a novel feature extraction scheme, an embedded feature selection process, a set of risk classifiers and finally, a risk calibration procedure. For temporal feature extraction, we cast the patient’s clinical history into a temporal image to which a bank of one-side filters are applied. The responses are then partly transformed into mid-level features and then selected in L1-norm framework under the extreme value theory. A set of probabilistic ordinal risk classifiers are then applied to compute the risk probabilities and further re-rank the features. Finally, the predicted risks are calibrated. Together with our Australian partner, we perform comprehensive study on data collected for the mental health cohort, and the experiments validate that our proposed framework outperforms risk assessment instruments by medical practitioners.

@CONFERENCE { adams_phung_venkatesh_acmmm12, TITLE = { Funniest Thing I've Seen Since: Shifting Perspectives from Multimedia Artefacts to Utterances }, AUTHOR = { Adams, B. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proceedings of ACM Workshop on Socially-Aware Multimedia, in conjunction with ACM Int. Conf on Multimedia (ACM-MM) }, YEAR = { 2012 }, ADDRESS = { Nara, Japan }, MONTH = { October }, ABSTRACT = { Suicide is a major concern in society. Despite of great attention paid by the community with very substantive medico-legal implications, there has been no satisfying method that can reliably predict the future attempted or completed suicide. We present an integrated machine learning framework to tackle this challenge. Our proposed framework consists of a novel feature extraction scheme, an embedded feature selection process, a set of risk classifiers and finally, a risk calibration procedure. For temporal feature extraction, we cast the patient’s clinical history into a temporal image to which a bank of one-side filters are applied. The responses are then partly transformed into mid-level features and then selected in L1-norm framework under the extreme value theory. A set of probabilistic ordinal risk classifiers are then applied to compute the risk probabilities and further re-rank the features. Finally, the predicted risks are calibrated. Together with our Australian partner, we perform comprehensive study on data collected for the mental health cohort, and the experiments validate that our proposed framework outperforms risk assessment instruments by medical practitioners. }, FILE = { :papers\\phung\\adams_phung_venkatesh_acmmm12.pdf:PDF;:papers\\phung\\adams_phung_venkatesh_acmmm12.pptx:OpenDocument presentation }, TIMESTAMP = { 2012.10.31 }, }

Large-Scale Statistical Modeling of Motion Patterns: A Bayesian Nonparametric Approach
Rana, S., Phung, D., Pham, S. and Venkatesh, S.. In Proceedings of Indian Conference on Vision, Graphics and Image Processing, India, December 2012. [ | ]
We propose a novel framework for large-scale scene understanding in static camera surveillance. Our techniques combine fast rank-1 constrained robust PCA to compute the foreground, with non-parametric Bayesian models for inference. Clusters are extracted in foreground patterns using a joint multinomial+Gaussian Dirichlet process model (DPM). Since the multinomial distribution is normalized, the Gaussian mixture distinguishes between similar spatial patterns but different activity levels (eg. car vs bike). We propose a modification of the decayed MCMC technique for incremental inference, providing the ability to discover theoretically unlimited patterns in unbounded video streams. A promising by-product of our framework is online, abnormal activity detection. A benchmark video and two surveillance videos, with the longest being 140 hours long are used in our experiments. The patterns discovered are as informative as existing scene understanding algorithms. However, unlike existing work, we achieve near real-time execution and encouraging performance in abnormal activity detection.

@INPROCEEDINGS { rana_phung_pham_venkatesh_civgip12, TITLE = { Large-Scale Statistical Modeling of Motion Patterns: A Bayesian Nonparametric Approach }, AUTHOR = { Rana, S. and Phung, D. and Pham, S. and Venkatesh, S. }, BOOKTITLE = { Proceedings of Indian Conference on Vision, Graphics and Image Processing }, YEAR = { 2012 }, ADDRESS = { India }, MONTH = { December }, ABSTRACT = { We propose a novel framework for large-scale scene understanding in static camera surveillance. Our techniques combine fast rank-1 constrained robust PCA to compute the foreground, with non-parametric Bayesian models for inference. Clusters are extracted in foreground patterns using a joint multinomial+Gaussian Dirichlet process model (DPM). Since the multinomial distribution is normalized, the Gaussian mixture distinguishes between similar spatial patterns but different activity levels (eg. car vs bike). We propose a modification of the decayed MCMC technique for incremental inference, providing the ability to discover theoretically unlimited patterns in unbounded video streams. A promising by-product of our framework is online, abnormal activity detection. A benchmark video and two surveillance videos, with the longest being 140 hours long are used in our experiments. The patterns discovered are as informative as existing scene understanding algorithms. However, unlike existing work, we achieve near real-time execution and encouraging performance in abnormal activity detection. }, FILE = { :papers\\phung\\rana_phung_pham_venkatesh_civgip12.pdf:PDF }, JOURNAL = { British Journal of Psychiatry }, OWNER = { dinh }, TIMESTAMP = { 2012.10.31 }, }

TOBY playpad - an accelarated learning tool for children with autism
Duong, T., Venkatesh, S., Phung, D., Greenhill, S. and Adams. Autism Spectrum Disorder Research Forum, Melbourne, Australia, November 2012. [ | ]
The diagnosis of children with Autism Spectrum Disorder (ASD) is on the rise and it is well-known that early intervention is critical. However, there often exists a long gap of waiting and wasting time for and between a “formal” diagnosis and therapy. The aim of TOBY Playpad (www.tobyplaypad.com) is to close this gap by empowering parents to help their children early and naturally at home and in their daily activities. The current form of TOBY is an iPad application that provides an adaptive syllabus of more than 200 activities developed by autism and machine learning experts to target key development areas which are known to be deficit for ASD children such as imitation, joint attention and language. TOBY delivers lessons, materials, instructions and interactions for both on-iPad and Natural Environment Tasks (NET) off-iPad activities. Since each child is different, TOBY is highly adaptive and personalized, intelligently increasing its complexity, varying prompts and reinforcements as the child progresses over time. Prompting and reinforcing strategies are also recommended for parents to make the most of everyday opportunities to teach children. Essentially, TOBY removes the burden on parents from extensive preparation of materials and manual data recording. We have conducted two trials on 20 and 50 children with AutismWest (www.autismwest.org.au) since last year. The results are promising providing evidence of learning shown in skills that were not present previously in some children. NET activities are shown to be effective for children and popular with parents.

@MISC { duong_venkatesh_phung_greenhill_adams_vac12, TITLE = { {TOBY} playpad - an accelarated learning tool for children with autism }, AUTHOR = { Duong, T. and Venkatesh, S. and Phung, D. and Greenhill, S. and Adams }, HOWPUBLISHED = { Autism Spectrum Disorder Research Forum, Melbourne, Australia }, MONTH = { November }, YEAR = { 2012 }, ABSTRACT = { The diagnosis of children with Autism Spectrum Disorder (ASD) is on the rise and it is well-known that early intervention is critical. However, there often exists a long gap of waiting and wasting time for and between a “formal” diagnosis and therapy. The aim of TOBY Playpad (www.tobyplaypad.com) is to close this gap by empowering parents to help their children early and naturally at home and in their daily activities. The current form of TOBY is an iPad application that provides an adaptive syllabus of more than 200 activities developed by autism and machine learning experts to target key development areas which are known to be deficit for ASD children such as imitation, joint attention and language. TOBY delivers lessons, materials, instructions and interactions for both on-iPad and Natural Environment Tasks (NET) off-iPad activities. Since each child is different, TOBY is highly adaptive and personalized, intelligently increasing its complexity, varying prompts and reinforcements as the child progresses over time. Prompting and reinforcing strategies are also recommended for parents to make the most of everyday opportunities to teach children. Essentially, TOBY removes the burden on parents from extensive preparation of materials and manual data recording. We have conducted two trials on 20 and 50 children with AutismWest (www.autismwest.org.au) since last year. The results are promising providing evidence of learning shown in skills that were not present previously in some children. NET activities are shown to be effective for children and popular with parents. }, FILE = { :papers\\phung\\duong_venkatesh_phung_greenhill_adams_vac12.pdf:PDF }, OWNER = { dinh }, TIMESTAMP = { 2012.10.31 }, }

TOBY Playpad: Early Intervention in Autism through Technology
Venkatesh, S., Phung, D., Greenhill, S., Duong, T. and Adams, B.. Technical report, Pattern Recognition and Data Analytics (PRaDA), Deakin University, Australia, 2012. [ | ]

@TECHREPORT { venkatesh_phung_greenhill_duong_adams_tr12, TITLE = { {TOBY Playpad}: Early Intervention in Autism through Technology }, AUTHOR = { Venkatesh, S. and Phung, D. and Greenhill, S. and Duong, T. and Adams, B. }, INSTITUTION = { Pattern Recognition and Data Analytics (PRaDA), Deakin University, Australia }, YEAR = { 2012 }, OWNER = { thinng }, TIMESTAMP = { 2013.01.07 }, }

A Bayesian nonparametric, joint modeling of multiple related data sources using restricted hierarchical Beta process
Gupta, S.K., Phung, D. and Venkatesh, S.. Technical report, Pattern Recognition and Data Analytics (PRaDA), 2012. [ | ]
Unsupervised joint modeling of multiple data sources is desired in many machine learning applications. When the sources are related and share underlying patterns, a joint factor analysis exploiting their statistical sharing can improve the performance of unsupervised learning tasks. A Bayesian nonparametric approach to the problem is to use hierarchical beta process (HBP), which has been shown to be useful for this task. However, the inference of HBP based models is intractable and the current methods require a series of approximations. In this paper, we present a tractable inference for a modified HBP prior without requiring any approximations. We derive a slice sampler, which keeps the inference tractable even when the likelihood and the prior over parameters are non-conjugate. This allows the application of HBP based models in much wider contexts without restrictions. We apply the modified hierarchical prior for joint factor analysis of multiple related data sources and show encouraging transfer learning results. We extend the prior by incorporating a Markov structure on latent variables and use it for jointly modeling multiple related time series. We examine the benefits of our proposed models for text modeling and image retrieval using both synthetic and real-world datasets, and demonstrate promising results on modeling multiple time-series.

@TECHREPORT { gupta_phung_venkatesh_tr12, TITLE = { A {B}ayesian nonparametric, joint modeling of multiple related data sources using restricted hierarchical {B}eta process }, AUTHOR = { Gupta, S.K. and Phung, D. and Venkatesh, S. }, INSTITUTION = { Pattern Recognition and Data Analytics (PRaDA) }, YEAR = { 2012 }, ABSTRACT = { Unsupervised joint modeling of multiple data sources is desired in many machine learning applications. When the sources are related and share underlying patterns, a joint factor analysis exploiting their statistical sharing can improve the performance of unsupervised learning tasks. A Bayesian nonparametric approach to the problem is to use hierarchical beta process (HBP), which has been shown to be useful for this task. However, the inference of HBP based models is intractable and the current methods require a series of approximations. In this paper, we present a tractable inference for a modified HBP prior without requiring any approximations. We derive a slice sampler, which keeps the inference tractable even when the likelihood and the prior over parameters are non-conjugate. This allows the application of HBP based models in much wider contexts without restrictions. We apply the modified hierarchical prior for joint factor analysis of multiple related data sources and show encouraging transfer learning results. We extend the prior by incorporating a Markov structure on latent variables and use it for jointly modeling multiple related time series. We examine the benefits of our proposed models for text modeling and image retrieval using both synthetic and real-world datasets, and demonstrate promising results on modeling multiple time-series. }, OWNER = { sunilg }, TIMESTAMP = { 2012.11.26 }, UNIVERSITY = { Center for Pattern Recognition and Data Analytics (PRaDA), Deakin University, Technical report : TR-PRaDA-04-12 }, }

An Analysis of Suicide Risk Assessment
Rana, S., Gupta, S., Venkatesh, S., Berk, M., Harvey, R, Phung, D., Saha, B., Nguyen, T., Truyen, T. and Luo, W. Technical report, {Pattern Recognition and Data Analytics, Deakin University}, 2012. [ | ]

@TECHREPORT { rana_gupta_venkatesh_berk_harvey_phung_saha_nguyen_truyen_luo_tr12, TITLE = { An Analysis of Suicide Risk Assessment }, AUTHOR = { Rana, S. and Gupta, S. and Venkatesh, S. and Berk, M. and Harvey, R and Phung, D. and Saha, B. and Nguyen, T. and Truyen, T. and Luo, W }, INSTITUTION = { {Pattern Recognition and Data Analytics, Deakin University} }, YEAR = { 2012 }, OWNER = { santu }, TIMESTAMP = { 2012.10.31 }, }

2011

Eventscapes: Visualizing Events over Time with Emotive Facets (short paper)
B. Adams, D. Phung and S. Venkatesh. In ACM Int. Conference on Multimedia, Arizona, USA, November 2011. [ | | pdf]
The scale and dynamicity of social media, and interaction between traditional news sources and online communities, has created challenges to information retrieval approaches. Users may have no clear information need or be unable to express it in the appropriate idiom, requiring instead to be oriented in an unfamiliar domain, to explore and learn. We present a novel data-driven visualization, termed Eventscape, that combines time, visual media, mood, and controversy. Formative evaluation highlights the value of emotive facets for rapid evaluation of mixed news and social media topics, and a role for such visualizations as pre-cursors to deeper search.

@INPROCEEDINGS { adams_phung_venkatesh_acmmm11, TITLE = { Eventscapes: Visualizing Events over Time with Emotive Facets (short paper) }, AUTHOR = { B. Adams and D. Phung and S. Venkatesh }, BOOKTITLE = { ACM Int. Conference on Multimedia, Arizona, USA }, YEAR = { 2011 }, MONTH = { November }, ABSTRACT = { The scale and dynamicity of social media, and interaction between traditional news sources and online communities, has created challenges to information retrieval approaches. Users may have no clear information need or be unable to express it in the appropriate idiom, requiring instead to be oriented in an unfamiliar domain, to explore and learn. We present a novel data-driven visualization, termed Eventscape, that combines time, visual media, mood, and controversy. Formative evaluation highlights the value of emotive facets for rapid evaluation of mixed news and social media topics, and a role for such visualizations as pre-cursors to deeper search. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\adams_phung_venkatesh_acmmm11.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Adams_etal_acmmm11.pdf }, }

A Bayesian Framework for Learning Shared and Individual Subspaces from Multiple Data Sources
Gupta, S., Phung, D., Adams, B. and Venkatesh, S.. In Advances in Knowledge Discovery and Data Mining, pages 136-147.Springer Berlin / Heidelberg, , 2011. (10.1007/978-3-642-20841-6_12). [ | | pdf]
This paper presents a novel Bayesian formulation to exploit shared structures across multiple data sources, constructing foundations for effective mining and retrieval across disparate domains. We jointly analyze diverse data sources using a unifying piece of metadata (textual tags). We propose a method based on Bayesian Probabilistic Matrix Factorization (BPMF) which is able to explicitly model the partial knowledge common to the datasets using shared subspaces and the knowledge specific to each dataset using individual subspaces. For the proposed model, we derive an efficient algorithm for learning the joint factorization based on Gibbs sampling. The effectiveness of the model is demonstrated by social media retrieval tasks across single and multiple media. The proposed solution is applicable to a wider context, providing a formal framework suitable for exploiting individual as well as mutual knowledge present across heterogeneous data sources of many kinds.

@INCOLLECTION { gupta_phung_adams_venkatesh_pakddm11, TITLE = { A Bayesian Framework for Learning Shared and Individual Subspaces from Multiple Data Sources }, AUTHOR = { Gupta, S. and Phung, D. and Adams, B. and Venkatesh, S. }, BOOKTITLE = { Advances in Knowledge Discovery and Data Mining }, PUBLISHER = { Springer Berlin / Heidelberg }, YEAR = { 2011 }, EDITOR = { Huang, Joshua and Cao, Longbing and Srivastava, Jaideep }, NOTE = { 10.1007/978-3-642-20841-6_12 }, PAGES = { 136-147 }, SERIES = { Lecture Notes in Computer Science }, VOLUME = { 6634 }, ABSTRACT = { This paper presents a novel Bayesian formulation to exploit shared structures across multiple data sources, constructing foundations for effective mining and retrieval across disparate domains. We jointly analyze diverse data sources using a unifying piece of metadata (textual tags). We propose a method based on Bayesian Probabilistic Matrix Factorization (BPMF) which is able to explicitly model the partial knowledge common to the datasets using shared subspaces and the knowledge specific to each dataset using individual subspaces. For the proposed model, we derive an efficient algorithm for learning the joint factorization based on Gibbs sampling. The effectiveness of the model is demonstrated by social media retrieval tasks across single and multiple media. The proposed solution is applicable to a wider context, providing a formal framework suitable for exploiting individual as well as mutual knowledge present across heterogeneous data sources of many kinds. }, AFFILIATION = { Department of Computing, Curtin University, Perth, Australia }, COMMENT = { coauthor }, ISBN = { 978-3-642-20840-9 }, KEYWORD = { Computer Science }, URL = { http://dx.doi.org/10.1007/978-3-642-20841-6_12 }, }

A Bayesian Framework for Learning Shared and Individual Subspaces from Multiple Data Sources
Gupta, S., Phung, D., Adams, B. and Venkatesh, S.. In Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Shenzen, China, May 2011. [ | | pdf]
This paper presents a novel Bayesian formulation to exploit shared structures across multiple data sources, constructing foundations for effective mining and retrieval across disparate domains. We jointly analyze diverse data sources using a unifying piece of metadata (textual tags). We propose a method based on Bayesian Probabilistic Matrix Factorization (BPMF) which is able to explicitly model the partial knowledge common to the datasets using shared subspaces and the knowledge specific to each dataset using individual subspaces. For the proposed model, we derive an efficient algorithm for learning the joint factorization based on Gibbs sampling. The effectiveness of the model is demonstrated by social media retrieval tasks across single and multiple media. The proposed solution is applicable to a wider context, providing a formal framework suitable for exploiting individual as well as mutual knowledge present across heterogeneous data sources of many kinds.

@INPROCEEDINGS { gupta_phung_adams_venkatesh_pakdd11, TITLE = { A {B}ayesian Framework for Learning Shared and Individual Subspaces from Multiple Data Sources }, AUTHOR = { Gupta, S. and Phung, D. and Adams, B. and Venkatesh, S. }, BOOKTITLE = { Proc. of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2011 }, ADDRESS = { Shenzen, China }, MONTH = { May }, ABSTRACT = { This paper presents a novel Bayesian formulation to exploit shared structures across multiple data sources, constructing foundations for effective mining and retrieval across disparate domains. We jointly analyze diverse data sources using a unifying piece of metadata (textual tags). We propose a method based on Bayesian Probabilistic Matrix Factorization (BPMF) which is able to explicitly model the partial knowledge common to the datasets using shared subspaces and the knowledge specific to each dataset using individual subspaces. For the proposed model, we derive an efficient algorithm for learning the joint factorization based on Gibbs sampling. The effectiveness of the model is demonstrated by social media retrieval tasks across single and multiple media. The proposed solution is applicable to a wider context, providing a formal framework suitable for exploiting individual as well as mutual knowledge present across heterogeneous data sources of many kinds. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\gupta_phung_adams_venkatesh_pakdd11.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2011.02.07 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Gupta_etal_pakdd11.pdf }, }

A Matrix Factorization Framework for Jointly Analyzing Multiple Nonnegative Data Sources
Gupta, S., Phung, D., Adams, B. and Venkatesh, S.. In Procs. of Text Mining Workshop, in conjuction with SIAM Int. Conf. on Data Mining, Arizona, USA, April 2011. [ | | pdf]
Nonnegative matrix factorization based methods provide one of the simplest and most effective approaches to text mining. However, their applicability is mainly limited to analyzing a single data source. In this paper, we propose a novel joint matrix factorization framework which can jointly analyze multiple data sources by exploiting their shared and individual structures. The proposed framework is flexible to handle any arbitrary sharing configurations encountered in real world data. We derive an efficient algorithm for learning the factorization and show that its convergence is theoretically guaranteed. We demonstrate the utility and effectiveness of the proposed framework in two real-world applications–improving social media retrieval using auxiliary sources and cross-social media retrieval. Representing each social media source using their textual tags, for both applications, we show that retrieval performance exceeds the existing state-of-the-art techniques. The proposed solution provides a generic framework and can be applicable to a wider context in data mining wherever one needs to exploit mutual and individual knowledge present across multiple data sources.

@INPROCEEDINGS { gupta_phung_adams_venkatesh_tmw11, TITLE = { A Matrix Factorization Framework for Jointly Analyzing Multiple Nonnegative Data Sources }, AUTHOR = { Gupta, S. and Phung, D. and Adams, B. and Venkatesh, S. }, BOOKTITLE = { Procs. of Text Mining Workshop, in conjuction with SIAM Int. Conf. on Data Mining }, YEAR = { 2011 }, ADDRESS = { Arizona, USA }, MONTH = { April }, ABSTRACT = { Nonnegative matrix factorization based methods provide one of the simplest and most effective approaches to text mining. However, their applicability is mainly limited to analyzing a single data source. In this paper, we propose a novel joint matrix factorization framework which can jointly analyze multiple data sources by exploiting their shared and individual structures. The proposed framework is flexible to handle any arbitrary sharing configurations encountered in real world data. We derive an efficient algorithm for learning the factorization and show that its convergence is theoretically guaranteed. We demonstrate the utility and effectiveness of the proposed framework in two real-world applications–improving social media retrieval using auxiliary sources and cross-social media retrieval. Representing each social media source using their textual tags, for both applications, we show that retrieval performance exceeds the existing state-of-the-art techniques. The proposed solution provides a generic framework and can be applicable to a wider context in data mining wherever one needs to exploit mutual and individual knowledge present across multiple data sources. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\gupta_phung_adams_venkatesh_tmw11.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2011.02.07 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Gupta_etal_twm11.pdf }, }

A Context-Sensitive Device to Help People with Autism Cope with Anxiety
Mohammedali, M., Phung, D., Adams, B. and Venkatesh, S.. In Procs. of ACM Conf. on Human Factors in Computing Systems (CHI) (work-in-progress track), Vancouver, Canada, May 2011. [ | | pdf]
We describe a smartphone application that helps people with Autism Spectrum Disorder (ASD) cope with anxiety attacks. Our prototype provides a one-touch interface for indicating a panic level. The device’s response—to instruct, soothe, and/or contact carers—is sensitive to the user’s context, consisting of time, location, ambient noise, and nearby friends. Formative evaluation unearths a critical challenge to building assistive technologies for ASD sufferers: can regimented interfaces foster flexible behaviour? Our observations suggest that a delicate balance of design goals is required for a viable assistive technology.

@INPROCEEDINGS { mohammedali_phung_adams_venkatesh_chi11, TITLE = { A Context-Sensitive Device to Help People with Autism Cope with Anxiety }, AUTHOR = { Mohammedali, M. and Phung, D. and Adams, B. and Venkatesh, S. }, BOOKTITLE = { Procs. of ACM Conf. on Human Factors in Computing Systems (CHI) (work-in-progress track) }, YEAR = { 2011 }, ADDRESS = { Vancouver, Canada }, MONTH = { May }, ABSTRACT = { We describe a smartphone application that helps people with Autism Spectrum Disorder (ASD) cope with anxiety attacks. Our prototype provides a one-touch interface for indicating a panic level. The device’s response—to instruct, soothe, and/or contact carers—is sensitive to the user’s context, consisting of time, location, ambient noise, and nearby friends. Formative evaluation unearths a critical challenge to building assistive technologies for ASD sufferers: can regimented interfaces foster flexible behaviour? Our observations suggest that a delicate balance of design goals is required for a viable assistive technology. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\mohammedali_phung_adams_venkatesh_chi11.pdf:PDF;:phung\\mohammedali_phung_adams_venkatesh_chi11.doc:Word }, OWNER = { 184698H }, TIMESTAMP = { 2011.02.16 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Moham_etal_chi11_poster.pdf http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Moham_etal_chi11.doc }, }

Emotional Reactions to Real-World Events in Social Networks
Nguyen, T., Phung, D., Adams, B. and Venkatesh, S.. In Proc. of International Workshop on Behavior Informatics (BI-11), in conjunction with the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Shenzen, China, May 2011. [ | | pdf]
An important problem in text mining is to detect bursty events. For example, knowing when a topic rises, falls or fades away has important implications in text indexing and retrieval systems, in market and consumer predictions, and advertising strategy. We introduce a sentiment index, computed from the current mood tags in a collection of blog posts utilizing an affective lexicon, potentially revealing subtle events discussed in the blogosphere. We then develop a method for extracting events based on this index and its distribution. Our second contribution is establishment of a new bursty structure in text streams termed a sentiment burst. We employ a stochastic model to detect bursty periods of moods and the events associated. Our results on a dataset of more than 12 million mood-tagged blog posts over a 4-year period have shown that our sentiment-based bursty events are indeed meaningful, in several ways.

@INPROCEEDINGS { nguyen_phung_adams_venkatesh_bi11, TITLE = { Emotional Reactions to Real-World Events in Social Networks }, AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Venkatesh, S. }, BOOKTITLE = { Proc. of International Workshop on Behavior Informatics (BI-11), in conjunction with the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) }, YEAR = { 2011 }, ADDRESS = { Shenzen, China }, MONTH = { May }, ABSTRACT = { An important problem in text mining is to detect bursty events. For example, knowing when a topic rises, falls or fades away has important implications in text indexing and retrieval systems, in market and consumer predictions, and advertising strategy. We introduce a sentiment index, computed from the current mood tags in a collection of blog posts utilizing an affective lexicon, potentially revealing subtle events discussed in the blogosphere. We then develop a method for extracting events based on this index and its distribution. Our second contribution is establishment of a new bursty structure in text streams termed a sentiment burst. We employ a stochastic model to detect bursty periods of moods and the events associated. Our results on a dataset of more than 12 million mood-tagged blog posts over a 4-year period have shown that our sentiment-based bursty events are indeed meaningful, in several ways. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\nguyen_phung_adams_venkatesh_bi11.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2011.02.07 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Nguyen_etal_bi11.pdf }, }

Towards Discovery of Influence and Personality Traits through Social Link Prediction
Nguyen, T., Phung, D., Adams, B. and Venkatesh, S.. In AAAI Int. Conf on Weblogs and Social Media (ICWSM), pages 566-569, Barcelona, Spain, July 2011. [ | | pdf]
Estimation of a person's influence and personality traits from social media data has many applications. We use social linkage criteria, such as number of followers and friends, as proxies to form corpora, from popular blogging site Livejournal, for examining two two-class classification problems: influential vs. non-influential, and extraversion vs. introversion. Classification is performed using automatically-derived psycholinguistic and mood-based features of a user's textual messages. We experiment with three sub-corpora of 10000 users each, and present the most effective predictors for each category. The best classification result, at 80\%, is achieved using psycholinguistic features; e.g., influentials are found to use more complex language, than non-influentials, and use more leisure-related terms.

@INPROCEEDINGS { nguyen_phung_adams_venkatesh_icwsm11, TITLE = { Towards Discovery of Influence and Personality Traits through Social Link Prediction }, AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Venkatesh, S. }, BOOKTITLE = { AAAI Int. Conf on Weblogs and Social Media (ICWSM) }, YEAR = { 2011 }, ADDRESS = { Barcelona, Spain }, MONTH = { July }, PAGES = { 566--569 }, ABSTRACT = { Estimation of a person's influence and personality traits from social media data has many applications. We use social linkage criteria, such as number of followers and friends, as proxies to form corpora, from popular blogging site Livejournal, for examining two two-class classification problems: influential vs. non-influential, and extraversion vs. introversion. Classification is performed using automatically-derived psycholinguistic and mood-based features of a user's textual messages. We experiment with three sub-corpora of 10000 users each, and present the most effective predictors for each category. The best classification result, at 80\%, is achieved using psycholinguistic features; e.g., influentials are found to use more complex language, than non-influentials, and use more leisure-related terms. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\nguyen_phung_adams_venkatesh_icwsm11.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2011.06.03 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Nguyen_etal_icwms11.pdf }, }

Prediction of Age, Sentiment, and Connectivity from Social Media Text
Nguyen, T., Phung, D., Adams, B. and Venkatesh, S.. In Int. Conf. on Web Information System Engineering (WISE), pages 227-240, Sydney, Australia, October 2011. [ | | pdf]
Social media corpora, including the textual output of blogs, forums, and messaging applications, provide fertile ground for linguistic analysis material diverse in topic and style, and at Web scale. We investigate manifest properties of textual messages, including latent topics, psycholinguistic features, and author mood, of a large corpus of blog posts, to analyze the impact of age, emotion, and social connectivity. These properties are found to be significantly different across the examined cohorts, which suggest discriminative features for a number of useful classification tasks. We build binary classifiers for old versus young bloggers, social versus solo bloggers, and happy versus sad posts with high performance. Analysis of discriminative features shows that age turns upon choice of topic, whereas sentiment orientation is evidenced by linguistic style. Good prediction is achieved for social connectivity using topic and linguistic features, leaving tagged mood a modest role in all classifications.

@INPROCEEDINGS { nguyen_phung_adams_venkatesh_wise11, TITLE = { Prediction of Age, Sentiment, and Connectivity from Social Media Text }, AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Venkatesh, S. }, BOOKTITLE = { Int. Conf. on Web Information System Engineering (WISE) }, YEAR = { 2011 }, ADDRESS = { Sydney, Australia }, MONTH = { October }, PAGES = { 227--240 }, ABSTRACT = { Social media corpora, including the textual output of blogs, forums, and messaging applications, provide fertile ground for linguistic analysis material diverse in topic and style, and at Web scale. We investigate manifest properties of textual messages, including latent topics, psycholinguistic features, and author mood, of a large corpus of blog posts, to analyze the impact of age, emotion, and social connectivity. These properties are found to be significantly different across the examined cohorts, which suggest discriminative features for a number of useful classification tasks. We build binary classifiers for old versus young bloggers, social versus solo bloggers, and happy versus sad posts with high performance. Analysis of discriminative features shows that age turns upon choice of topic, whereas sentiment orientation is evidenced by linguistic style. Good prediction is achieved for social connectivity using topic and linguistic features, leaving tagged mood a modest role in all classifications. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\nguyen_phung_adams_venkatesh_WISE11.pdf:PDF }, OWNER = { 14135433 }, TIMESTAMP = { 2011.08.01 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Nguyen_etal_wise11.pdf }, }

Detection of Cross-Channel Anomalies From Multiple Data Channels
Pham, S., Budhaditya, S., Phung, D. and Venkatesh, S.. In Proceedings of the IEEE International Conference on Data Mining (ICDM), Vancouver, Canada, December 2011. [ | ]
We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.

@INPROCEEDINGS { pham_budhaditya_phung_venkatesh_icdm11, TITLE = { Detection of Cross-Channel Anomalies From Multiple Data Channels }, AUTHOR = { Pham, S. and Budhaditya, S. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Proceedings of the IEEE International Conference on Data Mining (ICDM) }, YEAR = { 2011 }, ADDRESS = { Vancouver, Canada }, MONTH = { December }, ABSTRACT = { We identify and formulate a novel problem: cross-channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis. }, COMMENT = { coauthor }, OWNER = { thinng }, TIMESTAMP = { 2012.04.11 }, }

Mixed-variate restricted Boltzmann machines
Tran, T., Phung, D.Q and Venkatesh, S.. In Proc. of 3rd Asian Conference on Machine Learning, Taoyuan, Taiwan, 2011. [ | | ]
Modern datasets are becoming heterogeneous. To this end, we present in this paper Mixed- Variate Restricted Boltzmann Machines for simultaneously modelling variables of multiple types and modalities, including binary and continuous responses, categorical options, multicategorical choices, ordinal assessment and category-ranked preferences. Dependency among variables is modeled using latent binary variables, each of which can be interpreted as a particular hidden aspect of the data. The proposed model, similar to the standard RBMs, allows fast evaluation of the posterior for the latent variables. Hence, it is naturally suitable for many common tasks including, but not limited to, (a) as a pre-processing step to convert complex input data into a more convenient vectorial representation through the latent posteriors, thereby oering a dimensionality reduction capacity, (b) as a classifier supporting binary, multiclass, multilabel, and label-ranking outputs, or a regression tool for continuous outputs and (c) as a data completion tool for multimodal and heterogeneous data. We evaluate the proposed model on a large-scale dataset using the world opinion survey results on three tasks: feature extraction and visualization, data completion and prediction.

@INPROCEEDINGS { truyen_phung_venkatesh_acml11, TITLE = { Mixed-variate restricted {B}oltzmann machines }, AUTHOR = { Tran, T. and Phung, D.Q and Venkatesh, S. }, BOOKTITLE = { Proc. of 3rd Asian Conference on Machine Learning }, YEAR = { 2011 }, ADDRESS = { Taoyuan, Taiwan }, ABSTRACT = { Modern datasets are becoming heterogeneous. To this end, we present in this paper Mixed- Variate Restricted Boltzmann Machines for simultaneously modelling variables of multiple types and modalities, including binary and continuous responses, categorical options, multicategorical choices, ordinal assessment and category-ranked preferences. Dependency among variables is modeled using latent binary variables, each of which can be interpreted as a particular hidden aspect of the data. The proposed model, similar to the standard RBMs, allows fast evaluation of the posterior for the latent variables. Hence, it is naturally suitable for many common tasks including, but not limited to, (a) as a pre-processing step to convert complex input data into a more convenient vectorial representation through the latent posteriors, thereby oering a dimensionality reduction capacity, (b) as a classifier supporting binary, multiclass, multilabel, and label-ranking outputs, or a regression tool for continuous outputs and (c) as a data completion tool for multimodal and heterogeneous data. We evaluate the proposed model on a large-scale dataset using the world opinion survey results on three tasks: feature extraction and visualization, data completion and prediction. }, COMMENT = { coauthor }, URL = { 2011/conferences/tran_phung_venkatesh_acml11.pdf }, }

Probabilistic Models over Ordered Partitions with Application in Learning to Rank
Truyen, T., Phung, D. and Venkatesh, S.. Technical report, Department of Computing, Curtin University, January 2011. [ | | pdf]
This paper addresses the general problem of modelling and learning rank data with ties. We propose a probabilistic generative model, that models the process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on the problem of learning to rank with the data from the recently held Yahoo! challenge, and demonstrate that the models are competitive against well-known rivals.

@TECHREPORT { truyen_phung_venkatesh_tech11, TITLE = { Probabilistic Models over Ordered Partitions with Application in Learning to Rank }, AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. }, INSTITUTION = { Department of Computing, Curtin University }, YEAR = { 2011 }, MONTH = { January }, ABSTRACT = { This paper addresses the general problem of modelling and learning rank data with ties. We propose a probabilistic generative model, that models the process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on the problem of learning to rank with the data from the recently held Yahoo! challenge, and demonstrate that the models are competitive against well-known rivals. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\truyen_phung_venkatesh_tech11.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2011.02.07 }, URL = { http://www.citeulike.org/user/dah/article/7806292 }, }

Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering
Truyen, T., Phung, D. and Venkatesh, S.. In Procs. of {SIAM} Int. Conf. on Data Mining (SDM), Arizona, USA, April 2011. [ | | pdf]
Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that modelsthe process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals.

@INPROCEEDINGS { truyen_phung_venkatesh_sdm11, TITLE = { Probabilistic Models over Ordered Partitions with Applications in Document Ranking and Collaborative Filtering }, AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Procs. of {SIAM} Int. Conf. on Data Mining (SDM) }, YEAR = { 2011 }, ADDRESS = { Arizona, USA }, MONTH = { April }, ABSTRACT = { Ranking is an important task for handling a large amount of content. Ideally, training data for supervised ranking would include a complete rank of documents (or other objects such as images or videos) for a particular query. However, this is only possible for small sets of documents. In practice, one often resorts to document rating, in that a subset of documents is assigned with a small number indicating the degree of relevance. This poses a general problem of modelling and learning rank data with ties. In this paper, we propose a probabilistic generative model, that modelsthe process as permutations over partitions. This results in super-exponential combinatorial state space with unknown numbers of partitions and unknown ordering among them. We approach the problem from the discrete choice theory, where subsets are chosen in a stagewise manner, reducing the state space per each stage significantly. Further, we show that with suitable parameterisation, we can still learn the models in linear time. We evaluate the proposed models on two application areas: (i) document ranking with the data from the recently held Yahoo! challenge, and (ii) collaborative filtering with movie data. The results demonstrate that the models are competitive against well-known rivals. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\truyen_phung_venkatesh_sdm11.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2011.02.07 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Truyen_etal_sdm11.pdf }, }

Cognitive Intervention in Autism using Multimedia Stimulus (demo paper)
Venkatesh, S., Greenhill, S., Phung, D. and Adams, B.. In ACM Int. Conference on Multimedia, Arizona, USA, November 2011. [ | | pdf]
We demonstrate an open multimedia-based system for de- livering early intervention therapy for autism. Using flexible multi-touch interfaces together with principled ways to ac- cess rich content and tasks, we show how a syllabus can be translated into stimulus sets for early intervention. Media stimuli are able to be presented agnostic to language and media modality due to a semantic network of concepts and relations that are fundamental to language and cognitive development, which enable stimulus complexity to be ad- justed to child performance. Being open, the system is able to assemble enough media stimuli to avoid children over- learning, and is able to be customised to a specic child which aids with engagement. Computer-based delivery en- ables automation of session logging and reporting, a funda- mental and time-consuming part of therapy.

@INPROCEEDINGS { venkatesh_phung_adams_acmmm11, TITLE = { Cognitive Intervention in Autism using Multimedia Stimulus (demo paper) }, AUTHOR = { Venkatesh, S. and Greenhill, S. and Phung, D. and Adams, B. }, BOOKTITLE = { ACM Int. Conference on Multimedia, Arizona, USA }, YEAR = { 2011 }, MONTH = { November }, ABSTRACT = { We demonstrate an open multimedia-based system for de- livering early intervention therapy for autism. Using flexible multi-touch interfaces together with principled ways to ac- cess rich content and tasks, we show how a syllabus can be translated into stimulus sets for early intervention. Media stimuli are able to be presented agnostic to language and media modality due to a semantic network of concepts and relations that are fundamental to language and cognitive development, which enable stimulus complexity to be ad- justed to child performance. Being open, the system is able to assemble enough media stimuli to avoid children over- learning, and is able to be customised to a specic child which aids with engagement. Computer-based delivery en- ables automation of session logging and reporting, a funda- mental and time-consuming part of therapy. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\venkatesh_phung_adams_acmmm11.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Venkatesh_etal_acmmm11.pdf }, }

Surviving the data deluge: Scalable feature extraction, discrimination and analysis
Venkatesh, S., Phung, D. and Pham, S.. In Surviving the data deluge: Scalable feature extraction, discrimination and analysis, University of Washington, USA, Jan 2011. [ | | pdf]
The world is awash with data from proliferating distributed and multimedia sensors. An International Data Corporation whitepaper1 (March 2008) notes “the digital universe in 2007 was 281 billion Gigabytes” and for the first time exceeded available storage. It predicts that by 2011, more than half of this information will not have a permanent home. Surveillance cameras, sensor-based applications, and social networks are among the named drivers of this explosion. We propose avenues of research to address the underlying issues in the collection and analysis of data from pervasive, heterogeneous and distributed sensors.

@CONFERENCE { venkatesh_phung_pham_nsf11, TITLE = { Surviving the data deluge: Scalable feature extraction, discrimination and analysis }, AUTHOR = { Venkatesh, S. and Phung, D. and Pham, S. }, BOOKTITLE = { Surviving the data deluge: Scalable feature extraction, discrimination and analysis }, YEAR = { 2011 }, ADDRESS = { University of Washington, USA }, MONTH = { Jan }, ABSTRACT = { The world is awash with data from proliferating distributed and multimedia sensors. An International Data Corporation whitepaper1 (March 2008) notes “the digital universe in 2007 was 281 billion Gigabytes” and for the first time exceeded available storage. It predicts that by 2011, more than half of this information will not have a permanent home. Surveillance cameras, sensor-based applications, and social networks are among the named drivers of this explosion. We propose avenues of research to address the underlying issues in the collection and analysis of data from pervasive, heterogeneous and distributed sensors. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\Venkatesh_phung_pham_nsf11.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2011.02.07 }, URL = { http://sensorlab.cs.dartmouth.edu/NSFPervasiveComputingAtScale/pdf/1569390081.pdf }, }

Prior to 2010

Discovery of Latent Subcommunities in a Blog's Readership
Adams, B., Phung, D. and Venkatesh, S.. ACM Trans. Web, 4(3):1-30, 2010. [ | | pdf]
The blogosphere has grown to be a mainstream forum of social interaction as well as a commercially attractive source of information and influence. Tools are needed to better understand how communities that adhere to individual blogs are constituted in order to facilitate new personal, socially-focussed browsing paradigms, and understand how blog content is consumed, which is of interest to blog authors, big media and search. We present a novel approach to blog sub-community characterization by modelling individual blog readers using mixtures of an extension to the LDA family that jointly models phrases and time, Ngram Topic over Time (NTOT), and cluster with a number of similarity measures using Affinity Propagation. We experiment with two datasets: a small set of blogs whose authors provide feedback, and a set of popular, highly commented blogs, which provide indicators of algorithm scalability and interpretability without prior knowledge of a given blog. The results offer useful insight to the blog authors about their commenting community, and are observed to offer an integrated perspective on the topics of discussion and members engaged in those discussions for unfamiliar blogs. Our approach also holds promise as a component of solutions to related problems, such as online entity resolution and role discovery.

@ARTICLE { adams_phung_venkatesh_tweb10, TITLE = { Discovery of Latent Subcommunities in a Blog's Readership }, AUTHOR = { Adams, B. and Phung, D. and Venkatesh, S. }, JOURNAL = { ACM Trans. Web }, YEAR = { 2010 }, NUMBER = { 3 }, PAGES = { 1--30 }, VOLUME = { 4 }, ABSTRACT = { The blogosphere has grown to be a mainstream forum of social interaction as well as a commercially attractive source of information and influence. Tools are needed to better understand how communities that adhere to individual blogs are constituted in order to facilitate new personal, socially-focussed browsing paradigms, and understand how blog content is consumed, which is of interest to blog authors, big media and search. We present a novel approach to blog sub-community characterization by modelling individual blog readers using mixtures of an extension to the LDA family that jointly models phrases and time, Ngram Topic over Time (NTOT), and cluster with a number of similarity measures using Affinity Propagation. We experiment with two datasets: a small set of blogs whose authors provide feedback, and a set of popular, highly commented blogs, which provide indicators of algorithm scalability and interpretability without prior knowledge of a given blog. The results offer useful insight to the blog authors about their commenting community, and are observed to offer an integrated perspective on the topics of discussion and members engaged in those discussions for unfamiliar blogs. Our approach also holds promise as a component of solutions to related problems, such as online entity resolution and role discovery. }, ADDRESS = { New York, NY, USA }, COMMENT = { coauthor }, DOI = { http://doi.acm.org/10.1145/1806916.1806921 }, FILE = { :papers\\phung\\adams_phung_venkatesh_tweb2010_discovery_manuscript.pdf:PDF;:phung\\adams_phung_venkatesh_tweb2010_discovery.pdf:PDF }, ISSN = { 1559-1131 }, OWNER = { Dinh Phung }, PUBLISHER = { ACM }, TIMESTAMP = { 2010.06.29 }, URL = { http://doi.acm.org/10.1145/1806916.1806921 }, }

Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval
Gupta, S., Phung, D., Adams, B., Truyen, T. and Venkatesh, S.. In Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Washington DC, USA, July 2010. [ | | pdf]
Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets.

@INPROCEEDINGS { gupta_phung_adams_truyen_venkatesh_sigkdd10, TITLE = { Nonnegative Shared Subspace Learning and Its Application to Social Media Retrieval }, AUTHOR = { Gupta, S. and Phung, D. and Adams, B. and Truyen, T. and Venkatesh, S. }, BOOKTITLE = { Proc. of ACM Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD) }, YEAR = { 2010 }, ADDRESS = { Washington DC, USA }, MONTH = { July }, ABSTRACT = { Although tagging has become increasingly popular in online image and video sharing systems, tags are known to be noisy, ambiguous, incomplete and subjective. These factors can seriously affect the precision of a social tag-based web retrieval system. Therefore improving the precision performance of these social tag-based web retrieval systems has become an increasingly important research topic. To this end, we propose a shared subspace learning framework to leverage a secondary source to improve retrieval performance from a primary dataset. This is achieved by learning a shared subspace between the two sources under a joint Nonnegative Matrix Factorization in which the level of subspace sharing can be explicitly controlled. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. We validate the framework on image and video retrieval tasks in which tags from the LabelMe dataset are used to improve image retrieval performance from a Flickr dataset and video retrieval performance from a YouTube dataset. This has implications for how to exploit and transfer knowledge from readily available auxiliary tagging resources to improve another social web retrieval system. Our shared subspace learning framework is applicable to a range of problems where one needs to exploit the strengths existing among multiple and heterogeneous datasets. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\gupta_phung_adams_truyen_venkatesh_sigkdd10.pdf:PDF }, OWNER = { Dinh Phung }, TIMESTAMP = { 2010.06.29 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Gupta_etal_sigkdd10.pdf }, }

Hyper-Community Detection in the Blogosphere
Nguyen, T., Phung, D., Adams, B., Tran, T. and Venkatesh, S.. In Proc. of ACM Workshop on Social media, in conjunction with ACM Int. Conf on Multimedia (ACM-MM), Firenze, Italy, October 2010. [ | | pdf]
Most existing work on learning community structure in social network is graph-based whose links among the members are often represented as an adjacency matrix, encoding direct pairwise associations between members. In this paper, we propose a method to group online communities in blogosphere based on the topics learnt from the content blogged. We then consider a different type of online community formulation – the sentiment-based grouping of online communities. The problem of sentiment-based clustering for community structure discovery is rich with many interesting open aspects to be explored. We propose a novel approach for addressing hyper-community detection based on users’ sentiment. We employ a nonparametric clustering to automatically discover hidden hyper-communities and present the results obtained from a large dataset.

@CONFERENCE { nguyen_phung_adams_tran_venkatesh_acmmm10, TITLE = { Hyper-Community Detection in the Blogosphere }, AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Tran, T. and Venkatesh, S. }, BOOKTITLE = { Proc. of ACM Workshop on Social media, in conjunction with ACM Int. Conf on Multimedia (ACM-MM) }, YEAR = { 2010 }, ADDRESS = { Firenze, Italy }, MONTH = { October }, ABSTRACT = { Most existing work on learning community structure in social network is graph-based whose links among the members are often represented as an adjacency matrix, encoding direct pairwise associations between members. In this paper, we propose a method to group online communities in blogosphere based on the topics learnt from the content blogged. We then consider a different type of online community formulation – the sentiment-based grouping of online communities. The problem of sentiment-based clustering for community structure discovery is rich with many interesting open aspects to be explored. We propose a novel approach for addressing hyper-community detection based on users’ sentiment. We employ a nonparametric clustering to automatically discover hidden hyper-communities and present the results obtained from a large dataset. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\Nguyen_Phung_Adams_Truyen_Venkatesh_acm10hypercommunity.pdf:PDF }, OWNER = { 14135433 }, TIMESTAMP = { 2010.07.23 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Nguyen_etal_acm10.pdf }, }

Classification and Pattern Discovery of Mood in Weblogs
Nguyen, T., Phung, D., Adams, B., Tran, T. and Venkatesh, S.. In Advances in Knowledge Discovery and Data Mining, pages 283-290.Springer, , 2010. [ | | pdf]
Automatic data-driven analysis of mood from text is an emerging problem with many potential applications. Unlike generic text categorization, mood classification based on textual features is complicated by various factors, including its context- and user-sensitive nature. We present a comprehensive study of different feature selection schemes in machine learning for the problem of mood classification in weblogs. Notably, we introduce the novel use of a feature set based on the affective norms for English words (ANEW) lexicon studied in psychology. This feature set has the advantage of being computationally efficient while maintaining accuracy comparable to other state-of-the-art feature sets experimented with. In addition, we present results of data-driven clustering on a dataset of over 17 million blog posts with mood groundtruth. Our analysis reveals an interesting, and readily interpreted, structure to the linguistic expression of emotion, one that comprises valuable empirical evidence in support of existing psychological models of emotion, and in particular the dipoles pleasure–displeasure and activation–deactivation.

@INCOLLECTION { nguyen_phung_adams_truyen_venkatesh_pakdd10, TITLE = { Classification and Pattern Discovery of Mood in Weblogs }, AUTHOR = { Nguyen, T. and Phung, D. and Adams, B. and Tran, T. and Venkatesh, S. }, BOOKTITLE = { Advances in Knowledge Discovery and Data Mining }, PUBLISHER = { Springer }, YEAR = { 2010 }, EDITOR = { Mohammed J. Zaki and Jeffrey Xu Yu and B. Ravindran and Vikram Pudi }, PAGES = { 283--290 }, ABSTRACT = { Automatic data-driven analysis of mood from text is an emerging problem with many potential applications. Unlike generic text categorization, mood classification based on textual features is complicated by various factors, including its context- and user-sensitive nature. We present a comprehensive study of different feature selection schemes in machine learning for the problem of mood classification in weblogs. Notably, we introduce the novel use of a feature set based on the affective norms for English words (ANEW) lexicon studied in psychology. This feature set has the advantage of being computationally efficient while maintaining accuracy comparable to other state-of-the-art feature sets experimented with. In addition, we present results of data-driven clustering on a dataset of over 17 million blog posts with mood groundtruth. Our analysis reveals an interesting, and readily interpreted, structure to the linguistic expression of emotion, one that comprises valuable empirical evidence in support of existing psychological models of emotion, and in particular the dipoles pleasure–displeasure and activation–deactivation. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\Nguyen_Phung_Adams_Truyen_Venkatesh_pakdd10classification.pdf:PDF }, OWNER = { 14135433 }, TIMESTAMP = { 2010.07.23 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Nguyen_etal_pakdd10.pdf }, }

Social Reader: Following Social Networks in the Wilds of the Blogosphere
B. Adams, D. Phung and S. Venkatesh. In First Int. Workshop on Social Media, in conjunction with ACM Conference on Multimedia (ACMMM-WSM), pages 73-80, Beijing, China, October 2009. [ | | pdf]
The social interactions manifest in blogs by the network of comments left by owners and readers are an under-used re- source, both for blog pundits and industry. We present a web-based feed reader that renders these relationships with a graph representation, and enables exploration by display- ing people and blogs who are proximate to a user's network. Social Reader is an example of Casual Information Visual- ization, and aims to help the user understand and explore blog-based social networks in a daily, real-life setting. A six week study of the software involving 20 users confirmed the usefulness of the novel visual display, via a quantitative analysis of use logs, and an exit survey.

@INPROCEEDINGS { adams_phung_venkatesh_acmmm09, TITLE = { Social Reader: Following Social Networks in the Wilds of the Blogosphere }, AUTHOR = { B. Adams and D. Phung and S. Venkatesh }, BOOKTITLE = { First Int. Workshop on Social Media, in conjunction with ACM Conference on Multimedia (ACMMM-WSM) }, YEAR = { 2009 }, ADDRESS = { Beijing, China }, MONTH = { October }, PAGES = { 73-80 }, ABSTRACT = { The social interactions manifest in blogs by the network of comments left by owners and readers are an under-used re- source, both for blog pundits and industry. We present a web-based feed reader that renders these relationships with a graph representation, and enables exploration by display- ing people and blogs who are proximate to a user's network. Social Reader is an example of Casual Information Visual- ization, and aims to help the user understand and explore blog-based social networks in a daily, real-life setting. A six week study of the software involving 20 users confirmed the usefulness of the novel visual display, via a quantitative analysis of use logs, and an exit survey. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\adams_phung_venkatesh_iws09_social.pdf:PDF }, LOCATION = { Bejing, China }, OWNER = { Dinh Phung }, TIMESTAMP = { 2009.09.22 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Adams_el_iwsm09.pdf }, }

Efficient duration and hierarchical modeling for human activity recognition
Duong, Thi, Phung, Dinh, Bui, Hung and Venkatesh, Svetha. Artificial Intelligence (AIJ), 173(7-8):830-856, 2009. [ | | pdf | code]
A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling.

@ARTICLE { duong_phung_bui_venkatesh_aij09, AUTHOR = { Duong, Thi and Phung, Dinh and Bui, Hung and Venkatesh, Svetha }, TITLE = { Efficient duration and hierarchical modeling for human activity recognition }, JOURNAL = { Artificial Intelligence (AIJ) }, YEAR = { 2009 }, VOLUME = { 173 }, NUMBER = { 7-8 }, PAGES = { 830--856 }, ABSTRACT = { A challenge in building pervasive and smart spaces is to learn and recognize human activities of daily living (ADLs). In this paper, we address this problem and argue that in dealing with ADLs, it is beneficial to exploit both their typical duration patterns and inherent hierarchical structures. We exploit efficient duration modeling using the novel Coxian distribution to form the Coxian hidden semi-Markov model (CxHSMM) and apply it to the problem of learning and recognizing ADLs with complex temporal dependencies. The Coxian duration model has several advantages over existing duration parameterization using multinomial or exponential family distributions, including its denseness in the space of non-negative distributions, low number of parameters, computational efficiency and the existence of closed-form estimation solutions. Further we combine both hierarchical and duration extensions of the hidden Markov model (HMM) to form the novel switching hidden semi-Markov model (SHSMM), and empirically compare its performance with existing models. The model can learn what an occupant normally does during the day from unsegmented training data and then perform online activity classification, segmentation and abnormality detection. Experimental results show that Coxian modeling outperform a range of baseline models for the task of activity segmentation. We also achieve a recognition accuracy competitive to the current state-of-the-art multinomial duration model, whilst gain a significant reduction in computation. Furthermore, cross-validation model selection on the number of phases K in the Coxian indicates that only a small K is required to achieve the optimal performance. Finally, our models are further tested in a more challenging setting in which the tracking is often lost and the set of activities considerably overlap. With a small amount of labels supplied during training in a partially supervised learning mode, our models are again able to deliver reliable performance, again with a small number of phases, making our proposed framework an attractive choice for activity modeling. }, CODE = { https://github.com/DASCIMAL/CxHSMM }, COMMENT = { coauthor }, DOI = { http://dx.doi.org/10.1016/j.artint.2008.12.005 }, FILE = { :duong_phung_bui_venkatesh_aij09 - Efficient Duration and Hierarchical Modeling for Human Activity Recognition.pdf:PDF }, KEYWORDS = { activity, recognition, duration modeling, Coxian, Hidden semi-Markov model, HSMM , smart surveillance }, OWNER = { 184698H }, PUBLISHER = { Elsevier }, TIMESTAMP = { 2010.08.11 }, URL = { http://www.sciencedirect.com/science/article/pii/S0004370208002142 }, }

Flickr hypergroups
Negoescu, R.A., Adams, B., Phung, D., Venkatesh, S. and Gatica-Perez, D.. In Proceedings of the seventeen ACM international conference on Multimedia, pages 813-816, Beijing, China, October 2009. [ | | pdf]

@CONFERENCE { negoescu_adams_phung_venkatesh_gatica_acmmm09, TITLE = { Flickr hypergroups }, AUTHOR = { Negoescu, R.A. and Adams, B. and Phung, D. and Venkatesh, S. and Gatica-Perez, D. }, BOOKTITLE = { Proceedings of the seventeen ACM international conference on Multimedia }, YEAR = { 2009 }, ADDRESS = { Beijing, China }, MONTH = { October }, PAGES = { 813--816 }, COMMENT = { coauthor }, FILE = { :papers\\phung\\Negoescu_etal_acmmm2009_flickr.pdf:PDF }, OWNER = { 14135433 }, TIMESTAMP = { 2010.07.19 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Radu_el_acmmm09.pdf }, }

High accuracy context recovery using clustering mechanisms
Phung, D., Adams, B., Tran, K., Venkatesh, S. and Kumar, M.. In Proc. {IEEE} Int. Conf. on Pervasive Computing and Communications (PERCOM), pages 1-9, Texas, USA, March 2009. [ | | pdf]
This paper examines the recovery of user context in indoor environments with existing wireless infrastructures to enable assistive systems. We present a novel approach to the extraction of user context, casting the problem of context recovery as an unsupervised, clustering problem. A well known density-based clustering technique, DBSCAN, is adapted to recover user context that includes user motion state, and significant places the user visits from WiFi observations consisting of access point id and signal strength. Furthermore, user rhythms or sequences of places the user visits periodically are derived from the above low level contexts by employing a state-of-the-art probabilistic clustering technique, the Latent Dirichlet Allocation (LDA), to enable a variety of application services. Experimental results with real data are presented to validate the proposed unsupervised learning approach and demonstrate its applicability.

@INPROCEEDINGS { phung_adams_tran_venkatesh_kumar_percom09, TITLE = { High accuracy context recovery using clustering mechanisms }, AUTHOR = { Phung, D. and Adams, B. and Tran, K. and Venkatesh, S. and Kumar, M. }, BOOKTITLE = { Proc. {IEEE} Int. Conf. on Pervasive Computing and Communications (PerCom) }, YEAR = { 2009 }, ADDRESS = { Texas, USA }, MONTH = { March }, PAGES = { 1-9 }, ABSTRACT = { This paper examines the recovery of user context in indoor environments with existing wireless infrastructures to enable assistive systems. We present a novel approach to the extraction of user context, casting the problem of context recovery as an unsupervised, clustering problem. A well known density-based clustering technique, DBSCAN, is adapted to recover user context that includes user motion state, and significant places the user visits from WiFi observations consisting of access point id and signal strength. Furthermore, user rhythms or sequences of places the user visits periodically are derived from the above low level contexts by employing a state-of-the-art probabilistic clustering technique, the Latent Dirichlet Allocation (LDA), to enable a variety of application services. Experimental results with real data are presented to validate the proposed unsupervised learning approach and demonstrate its applicability. }, DOI = { http://doi.ieeecomputersociety.org/10.1109/PERCOM.2009.4912760 }, FILE = { :papers\\phung\\phung_adams_tran_venkatesh_kumar_percom09_high.pdf:PDF }, OWNER = { Dinh Phung }, TIMESTAMP = { 2009.09.22 }, URL = { http://www.computing.edu.au/~phung/wiki_new/uploads/Main/Phung_el_percom09.pdf }, }

Unsupervised Context Detection Using Wireless Signals
Phung, D., Adams, B., Venkatesh, S. and Kumar, M.. Pervasive and Mobile Computing, 5(6):714-733, 2009. [ | | pdf]
Sensing context plays an important role in many pervasive and mobile computing applications. In this paper, we present an unsupervised framework for extracting user context in indoor environments with existing wireless infrastructures. Our novel approach casts context detection into an incremental, unsupervised clustering setting. Using WiFi observations consisting of access point identification and signal strengths freely available in office or public spaces, we adapt a density-based clustering technique to recover basic forms of user contexts that include user motion state and significant places the user visits from time to time. High-level user context, termed rhythms, comprising sequences of significant places are derived from the above low-level context by employing probabilistic clustering techniques, latent Dirichlet allocation and its n-gram temporal extension. These user contexts can enable a wide range of context-ware application services. Experimental results with real data in comparison with existing methods are presented to validate the proposed approach. Our motion classification algorithm operates in real-time, and achieves improvement over an existing method; significant locations are detected with over accuracy and near perfect cluster purity. Richer indoor context and meaningful rhythms, such as typical daily routines or meeting patterns, are also inferred automatically from collected raw WiFi signals.

@ARTICLE { phung_adams_venkatesh_kumar_pmc09, TITLE = { Unsupervised Context Detection Using Wireless Signals }, AUTHOR = { Phung, D. and Adams, B. and Venkatesh, S. and Kumar, M. }, JOURNAL = { Pervasive and Mobile Computing }, YEAR = { 2009 }, NUMBER = { 6 }, PAGES = { 714--733 }, VOLUME = { 5 }, ABSTRACT = { Sensing context plays an important role in many pervasive and mobile computing applications. In this paper, we present an unsupervised framework for extracting user context in indoor environments with existing wireless infrastructures. Our novel approach casts context detection into an incremental, unsupervised clustering setting. Using WiFi observations consisting of access point identification and signal strengths freely available in office or public spaces, we adapt a density-based clustering technique to recover basic forms of user contexts that include user motion state and significant places the user visits from time to time. High-level user context, termed rhythms, comprising sequences of significant places are derived from the above low-level context by employing probabilistic clustering techniques, latent Dirichlet allocation and its n-gram temporal extension. These user contexts can enable a wide range of context-ware application services. Experimental results with real data in comparison with existing methods are presented to validate the proposed approach. Our motion classification algorithm operates in real-time, and achieves improvement over an existing method; significant locations are detected with over accuracy and near perfect cluster purity. Richer indoor context and meaningful rhythms, such as typical daily routines or meeting patterns, are also inferred automatically from collected raw WiFi signals. }, DOI = { DOI: 10.1016/j.pmcj.2009.07.005 }, FILE = { :papers\\phung\\phung_adams_venkatesh_kumar_pmc09_unsupervised.pdf:PDF }, ISSN = { 1574-1192 }, OWNER = { Dinh Phung }, PUBLISHER = { Elsevier }, TIMESTAMP = { 2009.09.22 }, URL = { http://www.sciencedirect.com/science/article/B7MF1-4WSHKB0-2/2/169d7eac1d55583c70314008eb511f34 }, }

MCMC for Hierarchical Semi-Markov Conditional Random Fields
Truyen, T., Phung, D., Bui, H. and Venkatesh, S.. In Proceedings of the NIPS (Advances in Neural Information Processing Systems) Workshop on Deep Learning for Speech Recognition and Related Applications, Whistler, BC, Canada, December 2009. [ | ]
Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length.

@INPROCEEDINGS { truyen_phung_bui_venkatesh_nips09, TITLE = { {MCMC} for Hierarchical Semi-Markov Conditional Random Fields }, AUTHOR = { Truyen, T. and Phung, D. and Bui, H. and Venkatesh, S. }, BOOKTITLE = { Proceedings of the NIPS (Advances in Neural Information Processing Systems) Workshop on Deep Learning for Speech Recognition and Related Applications }, YEAR = { 2009 }, ADDRESS = { Whistler, BC, Canada }, MONTH = { December }, ABSTRACT = { Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length. }, COMMENT = { coauthor }, OWNER = { Dinh Phung }, TIMESTAMP = { 2010.06.29 }, }

Ordinal Boltzmann Machines for Collaborative Filtering
Truyen Tran, Dinh Phung and Svetha Venkatesh. In Proc. of the 25th Conference on Uncertainty in Artificial Intelligence (UAI), pages 548-556, Arlington, Virginia, United States, June 2009. (Runner-up Best Paper Award). [ | | pdf]
Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and co-occurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and itembased processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods.

@INPROCEEDINGS { truyen_phung_venkatesh_uai09, AUTHOR = { Truyen Tran and Dinh Phung and Svetha Venkatesh }, TITLE = { Ordinal Boltzmann Machines for Collaborative Filtering }, BOOKTITLE = { Proc. of the 25th Conference on Uncertainty in Artificial Intelligence (UAI) }, YEAR = { 2009 }, SERIES = { UAI '09 }, PAGES = { 548--556 }, ADDRESS = { Arlington, Virginia, United States }, MONTH = { June }, PUBLISHER = { AUAI Press }, NOTE = { Runner-up Best Paper Award }, ABSTRACT = { Collaborative filtering is an effective recommendation technique wherein the preference of an individual can potentially be predicted based on preferences of other members. Early algorithms often relied on the strong locality in the preference data, that is, it is enough to predict preference of a user on a particular item based on a small subset of other users with similar tastes or of other items with similar properties. More recently, dimensionality reduction techniques have proved to be equally competitive, and these are based on the co-occurrence patterns rather than locality. This paper explores and extends a probabilistic model known as Boltzmann Machine for collaborative filtering tasks. It seamlessly integrates both the similarity and co-occurrence in a principled manner. In particular, we study parameterisation options to deal with the ordinal nature of the preferences, and propose a joint modelling of both the user-based and itembased processes. Experiments on moderate and large-scale movie recommendation show that our framework rivals existing well-known methods. }, ACMID = { 1795178 }, COMMENT = { coauthor }, FILE = { :truyen_phung_venkatesh_uai09 - Ordinal Boltzmann Machines for Collaborative Filtering.pdf:PDF }, ISBN = { 978-0-9749039-5-8 }, LOCATION = { Montreal, Quebec, Canada }, NUMPAGES = { 9 }, OWNER = { Dinh Phung }, TIMESTAMP = { 2009.09.22 }, URL = { http://dl.acm.org/citation.cfm?id=1795114.1795178 }, }

Sensing and Using Social Context
Adams, B., Phung, D. and Venkatesh, S.. {ACM} Transaction on Multimedia Computing, Communications and Applications, 5(2):11-27, November 2008. [ | ]
We present online algorithms to extract social context: Social spheres are labelled locations of signifcance, represented as convex hulls extracted from GPS traces. Colocation is determined from Bluetooth and GPS to extract social rhythms, patterns in time, duration, place and people corresponding to real-world activities. Social ties are formulated from proximity and shared spheres and rhythms. Quantitative evaluation is performed for 10+ million samples over 45 man-months. Applications are presented with assessment of perceived utility: Socio-Graph, a video and photo browser with filters for social metadata, and Jive, a blog browser that uses rhythms to discover similarity between entries automatically.

@ARTICLE { adams_phung_venkatesh_tomccap08, TITLE = { Sensing and Using Social Context }, AUTHOR = { Adams, B. and Phung, D. and Venkatesh, S. }, JOURNAL = { {ACM} Transaction on Multimedia Computing, Communications and Applications }, YEAR = { 2008 }, MONTH = { November }, NUMBER = { 2 }, PAGES = { 11-27 }, VOLUME = { 5 }, ABSTRACT = { We present online algorithms to extract social context: Social spheres are labelled locations of signifcance, represented as convex hulls extracted from GPS traces. Colocation is determined from Bluetooth and GPS to extract social rhythms, patterns in time, duration, place and people corresponding to real-world activities. Social ties are formulated from proximity and shared spheres and rhythms. Quantitative evaluation is performed for 10+ million samples over 45 man-months. Applications are presented with assessment of perceived utility: Socio-Graph, a video and photo browser with filters for social metadata, and Jive, a blog browser that uses rhythms to discover similarity between entries automatically. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\adams_phung_venkatesh_tomccap08sensing.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

The Hidden Permutation Model and Location-Based Activity Recognition
H. Bui, D. Phung, S. Venkatesh and H. Phan. In Proc. of National Conference on Artificial Intelligence (AAAI), pages 1345-1350, Chicago, USA, July 2008. [ | ]
Permutation modeling is challenging because of the combinatorial nature of the problem. However, such modeling is often required in many real-world applications, including activity recognition where subactivities are often permuted and partially ordered. This paper introduces a novel Hidden Permutation Model (HPM) that can learn the partial ordering constraints in permuted state sequences. The HPMis parameterized as an exponential family distribution and is flexible so that it can encode constraints via different feature functions. A chain-flipping Metropolis-Hastings Markov chain Monte Carlo (MCMC) is employed for inference to overcome the O(n!) complexity. Gradient-based maximum likelihood parameter learning is presented for two cases when the permutation is known and when it is hidden. The HPM is evaluated using both simulated and real data from a location-based activity recognition domain. Experimental results indicate that the HPM performs far better than other baseline models, including the naive Bayes classifier, the HMM classifier, and Kirshners multinomial permutation model. Our presented HPM is generic and can potentially be utilized in any problem where the modeling of permuted states from noisy data is needed.

@INPROCEEDINGS { bui_phung_venkatesh_phan_aaai08, TITLE = { The Hidden Permutation Model and Location-Based Activity Recognition }, AUTHOR = { H. Bui and D. Phung and S. Venkatesh and H. Phan }, BOOKTITLE = { Proc. of National Conference on Artificial Intelligence (AAAI) }, YEAR = { 2008 }, ADDRESS = { Chicago, USA }, MONTH = { July }, PAGES = { 1345--1350 }, VOLUME = { 8 }, ABSTRACT = { Permutation modeling is challenging because of the combinatorial nature of the problem. However, such modeling is often required in many real-world applications, including activity recognition where subactivities are often permuted and partially ordered. This paper introduces a novel Hidden Permutation Model (HPM) that can learn the partial ordering constraints in permuted state sequences. The HPMis parameterized as an exponential family distribution and is flexible so that it can encode constraints via different feature functions. A chain-flipping Metropolis-Hastings Markov chain Monte Carlo (MCMC) is employed for inference to overcome the O(n!) complexity. Gradient-based maximum likelihood parameter learning is presented for two cases when the permutation is known and when it is hidden. The HPM is evaluated using both simulated and real data from a location-based activity recognition domain. Experimental results indicate that the HPM performs far better than other baseline models, including the naive Bayes classifier, the HMM classifier, and Kirshners multinomial permutation model. Our presented HPM is generic and can potentially be utilized in any problem where the modeling of permuted states from noisy data is needed. }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Computable Social Patterns from Sparse Sensor Data
Phung, D., Adams, B. and Venkatesh, S.. In First Int. Workshop on Location and the Web, in conjuntion with the World Wide Web Conference, pages 69-72, Beijing, China, April 2008. [ | | ]
We present a computational framework to automatically discover high-order temporal social patterns from very noisy and sparse location data. We introduce the concept of \emph{social} \emph{footprint} and present a method to construct a codebook, enabling the transformation of raw sensor data into a collection of social pages. Each page captures social activities of a user over regular time period, and represented as a sequence of encoded footprints. Computable patterns are then defined as repeated structures found in these sequences. To do so, we appeal to modeling tools in document analysis and propose a Latent Social theme Dirichlet Allocation (LSDA) model -- a version of the Ngram topic model in \cite{Wang_el:07} with extra modeling of personal context. This model can be viewed as a Bayesian clustering method, jointly discovering temporal collocation of footprints and exploiting statistical strength across social pages, to automatically discovery high-order patterns. Alternatively, it can be viewed as a dimensionality reduction method where the reduced latent space can be interpreted as the hidden social `theme' -- a more abstract perception of user's daily activities. Applying this framework to a real-world noisy dataset collected over 1.5 years, we show that many useful and interesting patterns can be computed. Interpretable social themes can also be deduced from the discovered patterns.

@INPROCEEDINGS { phung_adams_venkatesh_locweb08, TITLE = { Computable Social Patterns from Sparse Sensor Data }, AUTHOR = { Phung, D. and Adams, B. and Venkatesh, S. }, BOOKTITLE = { First Int. Workshop on Location and the Web, in conjuntion with the World Wide Web Conference }, YEAR = { 2008 }, ADDRESS = { Beijing, China }, MONTH = { April }, PAGES = { 69--72 }, ABSTRACT = { We present a computational framework to automatically discover high-order temporal social patterns from very noisy and sparse location data. We introduce the concept of \emph{social} \emph{footprint} and present a method to construct a codebook, enabling the transformation of raw sensor data into a collection of social pages. Each page captures social activities of a user over regular time period, and represented as a sequence of encoded footprints. Computable patterns are then defined as repeated structures found in these sequences. To do so, we appeal to modeling tools in document analysis and propose a Latent Social theme Dirichlet Allocation (LSDA) model -- a version of the Ngram topic model in \cite{Wang_el:07} with extra modeling of personal context. This model can be viewed as a Bayesian clustering method, jointly discovering temporal collocation of footprints and exploiting statistical strength across social pages, to automatically discovery high-order patterns. Alternatively, it can be viewed as a dimensionality reduction method where the reduced latent space can be interpreted as the hidden social `theme' -- a more abstract perception of user's daily activities. Applying this framework to a real-world noisy dataset collected over 1.5 years, we show that many useful and interesting patterns can be computed. Interpretable social themes can also be deduced from the discovered patterns. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2008/conferences/phung_adams_venkatesh_locweb08.pdf }, }

Indoor Location Prediction Using Multiple Wireless Received Signal Strengths
Tran, K., Phung, D., Adams, B. and Venkatesh, S.. In The 7th Australasian Data Mining Conference ({AusDM}), Adelaide, Australia, December 2008. [ | | ]
This paper presents a framework for indoor location prediction system using multiple wireless signals available freely in public or office spaces. We first propose an abstract architectural design for the system, outlining its key components and their functionalities. Different from existing works, such as robot indoor localization which requires as precise localization as possible, our work focuses on a higher grain: location prediction. Such a problem has a great implication in context-aware systems such as indoor navigation or smart self-managed mobile devices (e.g., battery management). Central to these systems is an effective method to perform location prediction under different constraints such as dealing with multiple wireless sources, effects of human body heats or mobility of the users. To this end, the second part of this paper presents a comparative and comprehensive study on different choices for modeling signals strengths and prediction methods under different condition settings. The results show that with simple, but effective modeling method, almost perfect prediction accuracy can be achieved in the static environment, and up to 85\% in the presence of human movements. Finally, adopting the proposed framework we outline a fully developed system, named Marauder, that support user interface interaction and real-time voice-enabled location prediction.

@INPROCEEDINGS { tran_phung_adams_venkatesh_ausdm08, TITLE = { Indoor Location Prediction Using Multiple Wireless Received Signal Strengths }, AUTHOR = { Tran, K. and Phung, D. and Adams, B. and Venkatesh, S. }, BOOKTITLE = { The 7th Australasian Data Mining Conference ({AusDM}) }, YEAR = { 2008 }, ADDRESS = { Adelaide, Australia }, MONTH = { December }, ABSTRACT = { This paper presents a framework for indoor location prediction system using multiple wireless signals available freely in public or office spaces. We first propose an abstract architectural design for the system, outlining its key components and their functionalities. Different from existing works, such as robot indoor localization which requires as precise localization as possible, our work focuses on a higher grain: location prediction. Such a problem has a great implication in context-aware systems such as indoor navigation or smart self-managed mobile devices (e.g., battery management). Central to these systems is an effective method to perform location prediction under different constraints such as dealing with multiple wireless sources, effects of human body heats or mobility of the users. To this end, the second part of this paper presents a comparative and comprehensive study on different choices for modeling signals strengths and prediction methods under different condition settings. The results show that with simple, but effective modeling method, almost perfect prediction accuracy can be achieved in the static environment, and up to 85\% in the presence of human movements. Finally, adopting the proposed framework we outline a fully developed system, named Marauder, that support user interface interaction and real-time voice-enabled location prediction. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2008/conferences/tran_phung_adams_venkatesh_ausdm08.pdf }, }

Learning Discriminative Sequence Models from Partially Labelled Data for Activity Recognition
Truyen, T., Bui, H., Phung, D. and Venkatesh, S.. In Tenth Pacific Rim International Conference on Artificial Intelligence {PRICAI}, Hanoi, Vietnam, December 2008. [ | | ]
Recognising daily activity patterns of people from low-level sensory data is an important problem. Traditional approaches typically rely on generative models such as the hidden Markov models and training on fully labelled data. While activity data can be readily acquired from pervasive sensors, e.g. in smart environments, providing manual labels to support fully supervised learning is often expensive. In this paper, we propose a new approach based on partially-supervised training of discriminative sequence models such as the conditional random field (CRF) and the maximum entropy Markov model (MEMM). We show that the approach can reduce labelling effort, and at the same time, provides us with the flexibility and accuracy of the discriminative framework. Our experimental results in the video surveillance domain illustrate that these models can perform better than their generative counterpart (i.e. the partially hidden Markov model), even when a substantial amount of labels are unavailable.

@INPROCEEDINGS { truyen_bui_phung_venkatesh_pricai08, TITLE = { Learning Discriminative Sequence Models from Partially Labelled Data for Activity Recognition }, AUTHOR = { Truyen, T. and Bui, H. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Tenth Pacific Rim International Conference on Artificial Intelligence {PRICAI} }, YEAR = { 2008 }, ADDRESS = { Hanoi, Vietnam }, MONTH = { December }, ABSTRACT = { Recognising daily activity patterns of people from low-level sensory data is an important problem. Traditional approaches typically rely on generative models such as the hidden Markov models and training on fully labelled data. While activity data can be readily acquired from pervasive sensors, e.g. in smart environments, providing manual labels to support fully supervised learning is often expensive. In this paper, we propose a new approach based on partially-supervised training of discriminative sequence models such as the conditional random field (CRF) and the maximum entropy Markov model (MEMM). We show that the approach can reduce labelling effort, and at the same time, provides us with the flexibility and accuracy of the discriminative framework. Our experimental results in the video surveillance domain illustrate that these models can perform better than their generative counterpart (i.e. the partially hidden Markov model), even when a substantial amount of labels are unavailable. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2008/conferences/truyen_bui_phung_venkatesh_pricai08.pdf }, }

Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data
Truyen, T., Phung, D., Bui, H. and Venkatesh, S.. Advances in Neural Information Processing (NIPS), December 2008. [ | | ]
Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.

@ARTICLE { truyen_phung_bui_venkatesh_nips08, TITLE = { Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data }, AUTHOR = { Truyen, T. and Phung, D. and Bui, H. and Venkatesh, S. }, JOURNAL = { Advances in Neural Information Processing (NIPS) }, YEAR = { 2008 }, MONTH = { December }, ABSTRACT = { Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. }, ADDRESS = { Vancouver, Canada }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2008/conferences/truyen_phung_bui_venkatesh_nips08.pdf }, }

Hierarchical conditional random fields for recursive sequential data
Truyen, T., Phung, D., Bui, H. and Venkatesh, S.. Technical report, Department of Computing, Curtin University of Technology, 2008. (TR-Nov-2008). [ | | ]
Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.

@TECHREPORT { truyen_phung_bui_venkatesh_tr08, TITLE = { Hierarchical conditional random fields for recursive sequential data }, AUTHOR = { Truyen, T. and Phung, D. and Bui, H. and Venkatesh, S. }, INSTITUTION = { Department of Computing, Curtin University of Technology }, YEAR = { 2008 }, NOTE = { TR-Nov-2008 }, ABSTRACT = { Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical conditional random field (HCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we consider partially-supervised learning and propose algorithms for generalised partially-supervised learning and constrained inference. We demonstrate the HCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2008/reports/trueyn_phung_bui_venkatesh_tr08.pdf }, }

Constrained Sequence Classification for Lexical Disambiguation
Truyen, T., Phung, D. and Venkatesh, S.. In Tenth Pacific Rim International Conference on Artificial Intelligence {PRICAI}, Hanoi, Vietnam, December 2008. [ | | ]
This paper addresses lexical ambiguity with focus on a particular problem known as accent prediction, in that given an accentless sequence, we need to restore correct accents. This can be modelled as a sequence classification problem for which variants of Markov chains can be applied. Although the state space is large (about the vocabulary size), it is highly constrained when conditioned on the data observation. We investigate the application of several methods, including Powered Product-of-N-grams, Structured Perceptron and Conditional Random Fields (CRFs). We empirically show in the Vietnamese case that these methods are fairly robust and efficient. The second-order CRFs achieve best results with about 94\% term accuracy.

@INPROCEEDINGS { truyen_phung_venkatesh_pricai08, TITLE = { Constrained Sequence Classification for Lexical Disambiguation }, AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Tenth Pacific Rim International Conference on Artificial Intelligence {PRICAI} }, YEAR = { 2008 }, ADDRESS = { Hanoi, Vietnam }, MONTH = { December }, ABSTRACT = { This paper addresses lexical ambiguity with focus on a particular problem known as accent prediction, in that given an accentless sequence, we need to restore correct accents. This can be modelled as a sequence classification problem for which variants of Markov chains can be applied. Although the state space is large (about the vocabulary size), it is highly constrained when conditioned on the data observation. We investigate the application of several methods, including Powered Product-of-N-grams, Structured Perceptron and Conditional Random Fields (CRFs). We empirically show in the Vietnamese case that these methods are fairly robust and efficient. The second-order CRFs achieve best results with about 94\% term accuracy. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2008/conferences/truyen_phung_venkatesh_pricai08.pdf }, }

YouTube and I Find: Personalizing Multimedia Content Access
Venkatesh, S., Adams, B., Phung, D., Dorai, C., Farrell , R., Agnihotri, L. and Dimitrova, N.. Proceedings of {IEEE}, Special Issue on Advances in Multimedia and Information Retrieval, 96(4):697-711, April 2008. [ | ]
Recent growth in broadband access and proliferation of small personal devices that capture images and videos has led to explosive growth of multimedia content available everywhere from personal disks to the Web. While digital media capture and upload has become nearly universal with newer device technology, there is still a need for better tools and technologies to search large collections of multimedia data and to find and deliver the right content to a user according to her current needs and preferences. A renewed focus on the subjective dimension in the multimedia lifecycle, from creation, distribution to delivery and consumption is required, to address this need beyond what is feasible today. Integration of the subjective aspects of the media itself--its affective, perceptual, and physiological potential (both intended and achieved), together with those of the users themselves will allow for personalizing the content access, beyond todays facility. This integration, transforming the traditional multimedia information retrieval (MIR) indices to more effectively answer specific user needs, will allow a richer degree of personalization predicated on user intention and mode of interaction, relationship to the producer, and content of the media, and their history and lifestyle. In this paper, we identify the challenges in achieving this integration, current approaches to interpreting content creation processes, to user modelling and profiling and to personalized content selection, and detail future directions. The structure of the paper is as follows: In section I, we introduce the problem and present some definitions. In section II, we present a review of the aspects of personalized content and current approaches for the same. Section III discusses the problem of obtaining metadata that is required for personalized media creation and present eMediate as a case study of an integrated media capture environment. Section IV presents MAGIC system as case study of in putting users first in the distributed learning delivery. The aspects of modelling the user are presented as a case study in using users personality as a way to personalize summaries in section V. Finally section VI concludes the paper with a discussion on the emerging challenges and the open problems.

@ARTICLE { venkatesh_adams_phung_etal_pieee08, TITLE = { {YouTube and I Find}: Personalizing Multimedia Content Access }, AUTHOR = { Venkatesh, S. and Adams, B. and Phung, D. and Dorai, C. and Farrell , R. and Agnihotri, L. and Dimitrova, N. }, JOURNAL = { Proceedings of {IEEE}, Special Issue on Advances in Multimedia and Information Retrieval }, YEAR = { 2008 }, MONTH = { April }, NUMBER = { 4 }, PAGES = { 697-711 }, VOLUME = { 96 }, ABSTRACT = { Recent growth in broadband access and proliferation of small personal devices that capture images and videos has led to explosive growth of multimedia content available everywhere from personal disks to the Web. While digital media capture and upload has become nearly universal with newer device technology, there is still a need for better tools and technologies to search large collections of multimedia data and to find and deliver the right content to a user according to her current needs and preferences. A renewed focus on the subjective dimension in the multimedia lifecycle, from creation, distribution to delivery and consumption is required, to address this need beyond what is feasible today. Integration of the subjective aspects of the media itself--its affective, perceptual, and physiological potential (both intended and achieved), together with those of the users themselves will allow for personalizing the content access, beyond todays facility. This integration, transforming the traditional multimedia information retrieval (MIR) indices to more effectively answer specific user needs, will allow a richer degree of personalization predicated on user intention and mode of interaction, relationship to the producer, and content of the media, and their history and lifestyle. In this paper, we identify the challenges in achieving this integration, current approaches to interpreting content creation processes, to user modelling and profiling and to personalized content selection, and detail future directions. The structure of the paper is as follows: In section I, we introduce the problem and present some definitions. In section II, we present a review of the aspects of personalized content and current approaches for the same. Section III discusses the problem of obtaining metadata that is required for personalized media creation and present eMediate as a case study of an integrated media capture environment. Section IV presents MAGIC system as case study of in putting users first in the distributed learning delivery. The aspects of modelling the user are presented as a case study in using users personality as a way to personalize summaries in section V. Finally section VI concludes the paper with a discussion on the emerging challenges and the open problems. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Robust Wifi Localization Using Received Signal Strength
Khan, F., Phung, D. and Venkatesh, S.. Technical report, Department of Computing, Curtin University of Technology, 2007. (TR-June-2007). [ | | ]
We present an investigation into the problem of robust wifi localization using received signal strength. Various RF based probablistic wifi localization model has been developed that uses fingerprinting for location determination. But often they are too complicated and dependent on user's information and infrastructural information. We proposed three models which also takes into account that access points (APs) are randomly distributed, for developing the robust model which is simple, faster, generic, independent of user's information and can also withstand the infrastructural change. These models are naive Bayes classifier and it's modification . We used two techniques sampling Boolean matrix for utilizing the temporal spatial constraints respectively. To the best of our knowledge it is the first time anyone has used the boolean matrix to utilize the spatial constraints without the user information. Use of sampling gives us flexibility of varying sampling period to obtain delicate balance between latency and robustness. We employed confusion matrix and ranking statistics to analyse the performance of the classifier. Implementation of method shows that we can obtain accuracy up to 92.4\% and 98.6\% respectively for rank 1 and rank 2 classifier with a sampling period of 2s

@TECHREPORT { khan_phung_venkatesh_tr07, TITLE = { Robust Wifi Localization Using Received Signal Strength }, AUTHOR = { Khan, F. and Phung, D. and Venkatesh, S. }, INSTITUTION = { Department of Computing, Curtin University of Technology }, YEAR = { 2007 }, NOTE = { TR-June-2007 }, ABSTRACT = { We present an investigation into the problem of robust wifi localization using received signal strength. Various RF based probablistic wifi localization model has been developed that uses fingerprinting for location determination. But often they are too complicated and dependent on user's information and infrastructural information. We proposed three models which also takes into account that access points (APs) are randomly distributed, for developing the robust model which is simple, faster, generic, independent of user's information and can also withstand the infrastructural change. These models are naive Bayes classifier and it's modification . We used two techniques sampling Boolean matrix for utilizing the temporal spatial constraints respectively. To the best of our knowledge it is the first time anyone has used the boolean matrix to utilize the spatial constraints without the user information. Use of sampling gives us flexibility of varying sampling period to obtain delicate balance between latency and robustness. We employed confusion matrix and ranking statistics to analyse the performance of the classifier. Implementation of method shows that we can obtain accuracy up to 92.4\% and 98.6\% respectively for rank 1 and rank 2 classifier with a sampling period of 2s }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2007/reports/khan_phung_venkatesh_tr07.pdf }, }

The Hidden Permutation Model and Location-Based Activity Recognition
Phan, H., Phung, D., Bui, H. and Venkatesh, S.. Technical report, Department of Computing, Curtin University of Technology, 2007. (TR-Feb-2007). [ | | ]
An open challenge arises in many applications is to model and recognize permuted objects, where each permutation captures a particular arrangement of objects produced by some strong or weak ordering constraints between the objects. We introduce a novel Hidden Permutation Model (HPM) that models permuted states and we apply it to the problem of human activity recognition. The $\hpm$ parameterizes an exponential distribution over permutations to capture the ordering constraints among states. To make the model scalable and overcome the intractability of $O(n!)$ permutations, a smart flipping proposal MCMC is used for the inference task. Maximum likelihood parameter estimation is also presented for both observed and hidden variables cases. Simulated results are shown to compare the proposed model to various rivals including the HMM and Naive Bayes classifier (NBC). We also demonstrate an application of the model in a real world scenario, combining with density-based clustering, to recognize activities on campus using GPS signals.

@TECHREPORT { phan_phung_bui_venkatesh_tr07, TITLE = { The Hidden Permutation Model and Location-Based Activity Recognition }, AUTHOR = { Phan, H. and Phung, D. and Bui, H. and Venkatesh, S. }, INSTITUTION = { Department of Computing, Curtin University of Technology }, YEAR = { 2007 }, NOTE = { TR-Feb-2007 }, ABSTRACT = { An open challenge arises in many applications is to model and recognize permuted objects, where each permutation captures a particular arrangement of objects produced by some strong or weak ordering constraints between the objects. We introduce a novel Hidden Permutation Model (HPM) that models permuted states and we apply it to the problem of human activity recognition. The $\hpm$ parameterizes an exponential distribution over permutations to capture the ordering constraints among states. To make the model scalable and overcome the intractability of $O(n!)$ permutations, a smart flipping proposal MCMC is used for the inference task. Maximum likelihood parameter estimation is also presented for both observed and hidden variables cases. Simulated results are shown to compare the proposed model to various rivals including the HMM and Naive Bayes classifier (NBC). We also demonstrate an application of the model in a real world scenario, combining with density-based clustering, to recognize activities on campus using GPS signals. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2007/reports/phan_phung_bui_venkatesh_tr07.pdf }, }

Preference Networks: probabilistic models for recommendation systems
Truyen, T., Phung, D. and Venkatesh, S.. In The 6th Australasian Data Mining Conference (AusDM), Gold Coast, Australia, Dec 2007. [ | | ]
Recommender systems are important to help users select relevant and personalised information over massive amounts of data available. We propose an unified framework called Preference Network (PN) that jointly models various types of domain knowledge for the task of recommendation. The PN is a probabilistic model that systematically combines both content-based filtering and collaborative filtering into a single conditional Markov random field. Once estimated, it serves as a probabilistic database that supports various useful queries such as rating prediction and top-N recommendation. To handle the challenging problem of learning large networks of users and items, we employ a simple but effective pseudo-likelihood with regularisation. Experiments on the movie rating data demonstrate the merits of the PN.

@INPROCEEDINGS { truyen_phung_venkatesh_ausdm07, TITLE = { Preference Networks: probabilistic models for recommendation systems }, AUTHOR = { Truyen, T. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { The 6th Australasian Data Mining Conference (AusDM) }, YEAR = { 2007 }, ADDRESS = { Gold Coast, Australia }, MONTH = { Dec }, ABSTRACT = { Recommender systems are important to help users select relevant and personalised information over massive amounts of data available. We propose an unified framework called Preference Network (PN) that jointly models various types of domain knowledge for the task of recommendation. The PN is a probabilistic model that systematically combines both content-based filtering and collaborative filtering into a single conditional Markov random field. Once estimated, it serves as a probabilistic database that supports various useful queries such as rating prediction and top-N recommendation. To handle the challenging problem of learning large networks of users and items, we employ a simple but effective pseudo-likelihood with regularisation. Experiments on the movie rating data demonstrate the merits of the PN. }, COMMENT = { coauthor }, FILE = { :papers\\phung\\truyen_phung_venkatesh_ausdm07.pdf:PDF }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2007/conferences/truyen_phung_venkatesh_ausdm07.pdf }, }

Extraction of social context and application to personal multimedia exploration
Adams, B., Phung, D. and Venkatesh, S.. In ACM Int. Conference on Multimedia, Santa Barbara, USA, Oct. 2006. [ | ]
Personal media collections are often viewed and managed along the social dimension, the places we spend time at and the people we see, thus tools for extracting and using this information are required. We present novel algorithms for identifying socially significant places termed social spheres unobtrusively from GPS traces of daily life, and label them as one of Home, Work, or Other, with quantitative evaluation of 9 months taken from 5 users. We extract locational co-presence of these users and formulate a novel measure of social tie strength based on frequency of interaction, and the nature of spheres it occurs within. Comparative user studies of a multimedia browser designed to demonstrate the utility of social metadata indicate the usefulness of a simple interface allowing navigation and filtering in these terms. We note the application of social context is potentially much broader than personal media management, including context-aware device behaviour, life logs, social networks, and location-aware information services.

@INPROCEEDINGS { adams_phung_venkatesh_acmmm06, TITLE = { Extraction of social context and application to personal multimedia exploration }, AUTHOR = { Adams, B. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { ACM Int. Conference on Multimedia, Santa Barbara, USA }, YEAR = { 2006 }, MONTH = { Oct. }, ABSTRACT = { Personal media collections are often viewed and managed along the social dimension, the places we spend time at and the people we see, thus tools for extracting and using this information are required. We present novel algorithms for identifying socially significant places termed social spheres unobtrusively from GPS traces of daily life, and label them as one of Home, Work, or Other, with quantitative evaluation of 9 months taken from 5 users. We extract locational co-presence of these users and formulate a novel measure of social tie strength based on frequency of interaction, and the nature of spheres it occurs within. Comparative user studies of a multimedia browser designed to demonstrate the utility of social metadata indicate the usefulness of a simple interface allowing navigation and filtering in these terms. We note the application of social context is potentially much broader than personal media management, including context-aware device behaviour, life logs, social networks, and location-aware information services. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Human Behavior Recognition with Generic Exponential Family Duration Modeling in the Hidden Semi-Markov Model
Duong, T., Phung, D., Bui, H. and Venkatesh, S.. In International Conference on Pattern Recognition, pages 202-207, Hongkong, 2006. [ | ]
The ability to learn and recognize human activities of daily living (ADLs) is important in building pervasive and smart environments. In this paper, we tackle this problem using the hidden semi-Markov model. We discuss the state-of-the-art duration modeling choices and then address a large class of exponential family distributions to model state durations. Inference and learning are efficiently addressed by providing a graphical representation for the model in terms of a dynamic Bayesian network (DBN). We investigate both discrete and continuous distributions from the exponential family (Poisson and Inverse Gaussian respectively) for the problem of learning and recognizing ADLs. A full comparison between the exponential family duration models and other existing models including the traditional multinomial and the new Coxian are also presented. Our work thus completes a thorough investigation into the aspect of duration modeling and its application to human activities recognition in a real-world smart home surveillance scenario.

@INPROCEEDINGS { duong_phung_bui_venkatesh_icpr06, TITLE = { Human Behavior Recognition with Generic Exponential Family Duration Modeling in the Hidden Semi-Markov Model }, AUTHOR = { Duong, T. and Phung, D. and Bui, H. and Venkatesh, S. }, BOOKTITLE = { International Conference on Pattern Recognition }, YEAR = { 2006 }, ADDRESS = { Hongkong }, PAGES = { 202--207 }, VOLUME = { 3 }, ABSTRACT = { The ability to learn and recognize human activities of daily living (ADLs) is important in building pervasive and smart environments. In this paper, we tackle this problem using the hidden semi-Markov model. We discuss the state-of-the-art duration modeling choices and then address a large class of exponential family distributions to model state durations. Inference and learning are efficiently addressed by providing a graphical representation for the model in terms of a dynamic Bayesian network (DBN). We investigate both discrete and continuous distributions from the exponential family (Poisson and Inverse Gaussian respectively) for the problem of learning and recognizing ADLs. A full comparison between the exponential family duration models and other existing models including the traditional multinomial and the new Coxian are also presented. Our work thus completes a thorough investigation into the aspect of duration modeling and its application to human activities recognition in a real-world smart home surveillance scenario. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

A probabilistic model with parsinomious representation for sensor fusion in recognizing activity in pervasive environment
Tran, D., Phung, D., Bui, H. and Venkatesh, S.. In International Conference on Pattern Recognition, pages 168-172, Hongkong, 2006. [ | ]
To tackle the problem of increasing numbers of state transition parameters when the number of sensors increases, we present a probabilistic model together with several parsinomious representations for sensor fusion. These include context specific independence (CSI), mixtures of smaller multinomials and softmax function representations to compactly represent the state transitions of a large number of sensors. The model is evaluated on real-world data acquired through ubiquitous sensors in recognizing daily morning activities. The results show that the combination of CSI and mixtures of smaller multinomials achieves comparable performance with much fewer parameters.

@INPROCEEDINGS { tran_phung_bui_venkatesh_icpr06, TITLE = { A probabilistic model with parsinomious representation for sensor fusion in recognizing activity in pervasive environment }, AUTHOR = { Tran, D. and Phung, D. and Bui, H. and Venkatesh, S. }, BOOKTITLE = { International Conference on Pattern Recognition }, YEAR = { 2006 }, ADDRESS = { Hongkong }, PAGES = { 168--172 }, VOLUME = { 3 }, ABSTRACT = { To tackle the problem of increasing numbers of state transition parameters when the number of sensors increases, we present a probabilistic model together with several parsinomious representations for sensor fusion. These include context specific independence (CSI), mixtures of smaller multinomials and softmax function representations to compactly represent the state transitions of a large number of sensors. The model is evaluated on real-world data acquired through ubiquitous sensors in recognizing daily morning activities. The results show that the combination of CSI and mixtures of smaller multinomials achieves comparable performance with much fewer parameters. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Fast tree-based learning and inference in Markov random fields and applications
Truyen, T., Phung, D., Bui, H. and Venkatesh, S.. Technical report, Department of Computing, Curtin University of Technology, 2006. (TR-Dec-2006). [ | | ]
Inference and learning for general structure MRFs are usually intractable problems in computer vision. In this paper, we exploit a set of tree-based methods to efficiently address this problem and evaluate these methods against some current state-of-the-art approaches in three problems: scene segmentation, stereo matching and image denoising. Our method takes advantage of the tractability of treestructures embedded in MRFs to derive a tractable lower bound of the true likelihood, propose the use of tree-based pseudo-likelihood (PL) for parameter estimation, and the use of tree-based ICM (T-ICM) for MAP assignment. Unlike loopy belief propagation, our method is guaranteed to converge and it does so with limited memory required to store the messages. Further, unlike Graph-Cuts, our T-ICM can be applied with arbitrary cost functions such as those estimated during learning.

@TECHREPORT { truyen_phung_bui_venkatesh_tr06, TITLE = { Fast tree-based learning and inference in Markov random fields and applications }, AUTHOR = { Truyen, T. and Phung, D. and Bui, H. and Venkatesh, S. }, INSTITUTION = { Department of Computing, Curtin University of Technology }, YEAR = { 2006 }, NOTE = { TR-Dec-2006 }, ABSTRACT = { Inference and learning for general structure MRFs are usually intractable problems in computer vision. In this paper, we exploit a set of tree-based methods to efficiently address this problem and evaluate these methods against some current state-of-the-art approaches in three problems: scene segmentation, stereo matching and image denoising. Our method takes advantage of the tractability of treestructures embedded in MRFs to derive a tractable lower bound of the true likelihood, propose the use of tree-based pseudo-likelihood (PL) for parameter estimation, and the use of tree-based ICM (T-ICM) for MAP assignment. Unlike loopy belief propagation, our method is guaranteed to converge and it does so with limited memory required to store the messages. Further, unlike Graph-Cuts, our T-ICM can be applied with arbitrary cost functions such as those estimated during learning. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { 2006/reports/truyen_phung_bui_venkatesh_tr06.pdf }, }

AdaBoost.MRF: Boosted Markov Random Forests and Application to Multilevel Activity Recognition
Truyen, T., Phung, D., Bui, H. and Venkatesh, S.. In Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 1686-1693, New York, USA, June 2006. [ | ]
Activity recognition is an important issue in building intelligent monitoring systems. We address the recognition of multilevel activities in this paper via a conditional Markov random field (MRF), known as the dynamic conditional random field (DCRF). Parameter estimation in general MRFs using maximum likelihood is known to be computationally challenging (except for extreme cases), and thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. Distinct from most existing work, our algorithm can handle hidden variables (missing labels) and is particularly attractive for smarthouse domains where reliable labels are often sparsely observed. Furthermore, our method works exclusively on trees and thus is guaranteed to converge. We apply the AdaBoost.MRF algorithmto a home video surveillance application and demonstrate its efficacy.

@INPROCEEDINGS { truyen_phung_bui_venkatesh_cvpr06, TITLE = { {AdaBoost.MRF}: Boosted {M}arkov Random Forests and Application to Multilevel Activity Recognition }, AUTHOR = { Truyen, T. and Phung, D. and Bui, H. and Venkatesh, S. }, BOOKTITLE = { Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2006 }, ADDRESS = { New York, USA }, MONTH = { June }, PAGES = { 1686-1693 }, ABSTRACT = { Activity recognition is an important issue in building intelligent monitoring systems. We address the recognition of multilevel activities in this paper via a conditional Markov random field (MRF), known as the dynamic conditional random field (DCRF). Parameter estimation in general MRFs using maximum likelihood is known to be computationally challenging (except for extreme cases), and thus we propose an efficient boosting-based algorithm AdaBoost.MRF for this task. Distinct from most existing work, our algorithm can handle hidden variables (missing labels) and is particularly attractive for smarthouse domains where reliable labels are often sparsely observed. Furthermore, our method works exclusively on trees and thus is guaranteed to converge. We apply the AdaBoost.MRF algorithmto a home video surveillance application and demonstrate its efficacy. }, COMMENT = { coauthor }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Activity Recognition and Abnormality Detection with the Switching Hidden Semi-Markov Model
Duong, T., Bui, H., Phung, D. and Venkatesh, S.. In IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 838-845, San Diego, 20-26 June 2005. [ | ]
This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model.

@INPROCEEDINGS { duong_bui_phung_venkatesh_cvpr05, TITLE = { Activity Recognition and Abnormality Detection with the {S}witching {H}idden {S}emi-{M}arkov {M}odel }, AUTHOR = { Duong, T. and Bui, H. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2005 }, ADDRESS = { San Diego }, MONTH = { 20-26 June }, PAGES = { 838--845 }, PUBLISHER = { IEEE Computer Society }, VOLUME = { 1 }, ABSTRACT = { This paper addresses the problem of learning and recognizing human activities of daily living (ADL), which is an important research issue in building a pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the hidden semi-Markov model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMMperforms better than existing models including the flat HSMM and the hierarchical hidden Markov model in both classification and abnormality detection tasks, alleviating the need for presegmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model. }, KEYWORDS = { Activity Recognition, Abnormality detection, semi-Markov, hierarchical HSMM }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Efficient Coxian Duration Modelling for Activity Recognition in Smart Environments with the Hidden semi-Markov Model
Duong, Thi, Phung, Dinh, Bui, H.Hung and Venkatesh, Svetha. In 2005 International Conference on Intelligent Sensors, Sensor Networks and Information Processing, pages 277-282, Dec 2005. [ | | pdf | code]
In this paper, we exploit the discrete Coxian distribution and propose a novel form of stochastic model, termed as the Coxian hidden semi-Makov model (Cox-HSMM), and apply it to the task of recognising activities of daily living (ADLs) in a smart house environment. The use of the Coxian has several advantages over traditional parameterization (e.g. multinomial or continuous distributions) including the low number of free parameters needed, its computational efficiency, and the existing of closed-form solution. To further enrich the model in real-world applications, we also address the problem of handling missing observation for the proposed Cox-HSMM. In the domain of ADLs, we emphasize the importance of the duration information and model it via the Cox-HSMM. Our experimental results have shown the superiority of the Cox-HSMM in all cases when compared with the standard HMM. Our results have further shown that outstanding recognition accuracy can be achieved with relatively low number of phases required in the Coxian, thus making the Cox-HSMM particularly suitable in recognizing ADLs whose movement trajectories are typically very long in nature.

@INPROCEEDINGS { duong_phung_bui_venkatesh_issnips05, AUTHOR = { Duong, Thi and Phung, Dinh and Bui, H.Hung and Venkatesh, Svetha }, TITLE = { Efficient Coxian Duration Modelling for Activity Recognition in Smart Environments with the Hidden semi-Markov Model }, BOOKTITLE = { 2005 International Conference on Intelligent Sensors, Sensor Networks and Information Processing }, YEAR = { 2005 }, PAGES = { 277-282 }, MONTH = { Dec }, ABSTRACT = { In this paper, we exploit the discrete Coxian distribution and propose a novel form of stochastic model, termed as the Coxian hidden semi-Makov model (Cox-HSMM), and apply it to the task of recognising activities of daily living (ADLs) in a smart house environment. The use of the Coxian has several advantages over traditional parameterization (e.g. multinomial or continuous distributions) including the low number of free parameters needed, its computational efficiency, and the existing of closed-form solution. To further enrich the model in real-world applications, we also address the problem of handling missing observation for the proposed Cox-HSMM. In the domain of ADLs, we emphasize the importance of the duration information and model it via the Cox-HSMM. Our experimental results have shown the superiority of the Cox-HSMM in all cases when compared with the standard HMM. Our results have further shown that outstanding recognition accuracy can be achieved with relatively low number of phases required in the Coxian, thus making the Cox-HSMM particularly suitable in recognizing ADLs whose movement trajectories are typically very long in nature. }, CODE = { https://github.com/DASCIMAL/CxHSMM }, DOI = { 10.1109/ISSNIP.2005.1595592 }, FILE = { :duong_phung_bui_venkatesh_issnips05 - Efficient Coxian Duration Modelling for Activity Recognition in Smart Environments with the Hidden Semi Markov Model.pdf:PDF }, KEYWORDS = { Aging;Character recognition;Closed-form solution;Computational efficiency;Computerized monitoring;Degradation;Distributed computing;Hidden Markov models;Senior citizens;Stochastic processes }, URL = { http://ieeexplore.ieee.org/abstract/document/1595592/ }, }

Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model
Nguyen, N., Phung, D., Bui, H. and Venkatesh, S.. In Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), pages 955-960, San Diego, 2005. [ | ]
Directly modeling the inherent hierarchy and shared structures of human behaviors, we present an application of the hierarchical hidden Markov model (HHMM) for the problem of activity recognition. We argue that to robustly model and recognize complex human activities, it is crucial to exploit both the natural hierarchical decomposition and shared semantics embedded in the movement trajectories. To this end, we propose the use of the HHMM, a rich stochastic model that has been recently extended to handle shared structures, for representing and recognizing a set of complex indoor activities. Furthermore, in the need of real-time recognition, we propose a Rao-Blackwellised particle filter (RBPF) that efficiently computes the filtering distribution at a constant time complexity for each new observation arrival. The main contributions of this paper lie in the application of the sharedstructure HHMM, the estimation of the model's parameters at all levels simultaneously, and a construction of an RBPF approximate inference scheme. The experimental results in a real-world environment have confirmed our belief that directly modeling shared structures not only reduces computational cost, but also improves recognition accuracy when compared with the tree HHMM and the flat HMM.

@INPROCEEDINGS { nguyen_phung_bui_venkatesh_cvpr05, TITLE = { Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Model }, AUTHOR = { Nguyen, N. and Phung, D. and Bui, H. and Venkatesh, S. }, BOOKTITLE = { Proc. of IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR) }, YEAR = { 2005 }, ADDRESS = { San Diego }, PAGES = { 955--960 }, PUBLISHER = { IEEE Computer Soceity }, VOLUME = { 1 }, ABSTRACT = { Directly modeling the inherent hierarchy and shared structures of human behaviors, we present an application of the hierarchical hidden Markov model (HHMM) for the problem of activity recognition. We argue that to robustly model and recognize complex human activities, it is crucial to exploit both the natural hierarchical decomposition and shared semantics embedded in the movement trajectories. To this end, we propose the use of the HHMM, a rich stochastic model that has been recently extended to handle shared structures, for representing and recognizing a set of complex indoor activities. Furthermore, in the need of real-time recognition, we propose a Rao-Blackwellised particle filter (RBPF) that efficiently computes the filtering distribution at a constant time complexity for each new observation arrival. The main contributions of this paper lie in the application of the sharedstructure HHMM, the estimation of the model's parameters at all levels simultaneously, and a construction of an RBPF approximate inference scheme. The experimental results in a real-world environment have confirmed our belief that directly modeling shared structures not only reduces computational cost, but also improves recognition accuracy when compared with the tree HHMM and the flat HMM. }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Probabilistic and Film Grammar Based Methods for Video Content Analysis
Phung, Dinh. PhD thesis, Curtin University of Technology, Australia, 2005. [ | | pdf]
In the last decade, we have truly entered the information age with an explosion in the amount of digital data produced every day. The growing complexity and inadequacy of existing tools for managing this data proliferation have highlighted the need for better tools and techniques. Towards this end, this thesis aims at exploring Film Grammar and formal probabilistic models for video content analysis, in which the educational videos are used as the domain of investigation. To utilise Film Grammar, we base our work in a Computational Media Aesthetics framework as a systematic way of harnessing film theory in seeking meaningful descriptors. We study the aspects of `grammars' that are peculiar to the educational genre to uncover the nature of semantics and structural information presented. In our initial study, a hierarchy of narrative structural units for this video genre is proposed and automatically recognised. The hierarchy is learned using a set of features extracted from audio and visual streams. The experimental results demonstrate the usefulness of the proposed hierarchy. Next, we exploit film theory to extract {\em expressive} elements, computed from low-level features, that are useful for educational videos. First, we examine the {\em content density} function as a measure of the `rate of information delivered', study its key contributing factors and propose a computational form for it. Observing that a drop or rise in this function reflects important clues about the structure of the video, we propose heuristic and probabilistic algorithms for the detection of subtopic boundaries. We then advance this study with the extraction of the {\em dramatic} and {\em thematic} functions. The thematic function reflects the `instructional' or `informative' portions in the video where the video-maker decides to interfere in the subject matter being presented. The dramatic function, on the other hand, provides information about the `dramatisation level' of the video, such as a film segment where an event is dramatised. Combining information from the content density and thematic functions allows us to further segment an educational video into a two-level hierarchy of topical content, namely at main topic and subtopic levels. In seeking formal probabilistic models, our key observation is that semantic concepts in the video domain possess a natural hierarchical decomposition, and more noticeably there is {\em tight} inheritance of semantics in the hierarchy. Here, we theoretically extend the tree-structure Hierarchical Hidden Markov Models (\HHMM) in~\citep{FINE_EL-98} to allow arbitrary sharing at any level in the topology of the model. We show how the exact inference and learning can be done in this general case with the same complexity as in~\citep{FINE_EL-98}. To deal with long observation sequences, we propose a novel scaling algorithm to avoid numerical underflow. In addition, we also propose a new generalised Viterbi algorithm and address the issue of continuous observations for this model. Following the theoretical extension to the HHMM, we present two applications exploiting the expressiveness of shared structures in the HHMM for the problem of semantic analysis and segmentation in educational videos. First, it is shown that subtopic boundary transitions can be detected in this framework. Second, we show that useful narrative structures can be learned automatically with the HHMM. In both applications, the domain knowledge is utilised as prior information to construct the topology of the HHMM.

@PHDTHESIS { phung_phd05_probabilistic, AUTHOR = { Phung, Dinh }, SCHOOL = { Curtin University of Technology, Australia }, TITLE = { Probabilistic and Film Grammar Based Methods for Video Content Analysis }, YEAR = { 2005 }, ABSTRACT = { In the last decade, we have truly entered the information age with an explosion in the amount of digital data produced every day. The growing complexity and inadequacy of existing tools for managing this data proliferation have highlighted the need for better tools and techniques. Towards this end, this thesis aims at exploring Film Grammar and formal probabilistic models for video content analysis, in which the educational videos are used as the domain of investigation. To utilise Film Grammar, we base our work in a Computational Media Aesthetics framework as a systematic way of harnessing film theory in seeking meaningful descriptors. We study the aspects of `grammars' that are peculiar to the educational genre to uncover the nature of semantics and structural information presented. In our initial study, a hierarchy of narrative structural units for this video genre is proposed and automatically recognised. The hierarchy is learned using a set of features extracted from audio and visual streams. The experimental results demonstrate the usefulness of the proposed hierarchy. Next, we exploit film theory to extract {\em expressive} elements, computed from low-level features, that are useful for educational videos. First, we examine the {\em content density} function as a measure of the `rate of information delivered', study its key contributing factors and propose a computational form for it. Observing that a drop or rise in this function reflects important clues about the structure of the video, we propose heuristic and probabilistic algorithms for the detection of subtopic boundaries. We then advance this study with the extraction of the {\em dramatic} and {\em thematic} functions. The thematic function reflects the `instructional' or `informative' portions in the video where the video-maker decides to interfere in the subject matter being presented. The dramatic function, on the other hand, provides information about the `dramatisation level' of the video, such as a film segment where an event is dramatised. Combining information from the content density and thematic functions allows us to further segment an educational video into a two-level hierarchy of topical content, namely at main topic and subtopic levels. In seeking formal probabilistic models, our key observation is that semantic concepts in the video domain possess a natural hierarchical decomposition, and more noticeably there is {\em tight} inheritance of semantics in the hierarchy. Here, we theoretically extend the tree-structure Hierarchical Hidden Markov Models (\HHMM) in~\citep{FINE_EL-98} to allow arbitrary sharing at any level in the topology of the model. We show how the exact inference and learning can be done in this general case with the same complexity as in~\citep{FINE_EL-98}. To deal with long observation sequences, we propose a novel scaling algorithm to avoid numerical underflow. In addition, we also propose a new generalised Viterbi algorithm and address the issue of continuous observations for this model. Following the theoretical extension to the HHMM, we present two applications exploiting the expressiveness of shared structures in the HHMM for the problem of semantic analysis and segmentation in educational videos. First, it is shown that subtopic boundary transitions can be detected in this framework. Second, we show that useful narrative structures can be learned automatically with the HHMM. In both applications, the domain knowledge is utilised as prior information to construct the topology of the HHMM. }, FILE = { :phung_phd05_probabilistic - Probabilistic and Film Grammar Based Methods for Video Content Analysis.pdf:PDF }, GROUP = { abstract HMM, behaviour recognition, HHMM }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, URL = { http://prada-research.net/~dinh/uploads/Main/Publications/phung_phd05.pdf }, }

Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models
Phung, D., Duong, T., Bui, H. and Venkatesh, S.. In ACM Int. Conf on Multimedia (ACM-MM), Singapore, 6--11 Nov. 2005. [ | ]
In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling.

@INPROCEEDINGS { phung_duong_bui_venkatesh_acmmm05, TITLE = { Topic Transition Detection Using Hierarchical Hidden Markov and Semi-Markov Models }, AUTHOR = { Phung, D. and Duong, T. and Bui, H. and Venkatesh, S. }, BOOKTITLE = { ACM Int. Conf on Multimedia (ACM-MM) }, YEAR = { 2005 }, ADDRESS = { Singapore }, MONTH = { 6--11 Nov. }, ABSTRACT = { In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling. }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Structural Unit Identification and Segmentation of Topical Content in Educational Videos
Phung, Dinh and Venkatesh, Svetha. Technical report, Department of Computing, Curtin University of Technology, 2005. (TR-May-2005). [ | ]
Automatically structuralising educational video is a challenging problem for efficient content management and cataloging in E-learning environments. This paper addresses this problem and aims to achieve two objectives. First, we propose a hierarchy of narrative structures for this film genre based on a knowledge of production means for instructional media. We present a useful set of of audiovisual features devised for this problem, along with a hierarchical decision tree-based classification system to determine and discriminate between these structures. The second goal is to partition educational video into topical sections. We propose a novel content density function to delineate sections underscored by changes in topics in this video genre. Based on this function, we develop heuristic and probabilistic approaches to determine topic boundaries. We study the performance of the two methods on several training and lecture videos, and our experimental results demonstrate the effectiveness and robustness of these schemes.

@TECHREPORT { phung_venkatesh_tr05, TITLE = { Structural Unit Identification and Segmentation of Topical Content in Educational Videos }, AUTHOR = { Phung, Dinh and Venkatesh, Svetha }, INSTITUTION = { Department of Computing, Curtin University of Technology }, YEAR = { 2005 }, NOTE = { TR-May-2005 }, ABSTRACT = { Automatically structuralising educational video is a challenging problem for efficient content management and cataloging in E-learning environments. This paper addresses this problem and aims to achieve two objectives. First, we propose a hierarchy of narrative structures for this film genre based on a knowledge of production means for instructional media. We present a useful set of of audiovisual features devised for this problem, along with a hierarchical decision tree-based classification system to determine and discriminate between these structures. The second goal is to partition educational video into topical sections. We propose a novel content density function to delineate sections underscored by changes in topics in this video genre. Based on this function, we develop heuristic and probabilistic approaches to determine topic boundaries. We study the performance of the two methods on several training and lecture videos, and our experimental results demonstrate the effectiveness and robustness of these schemes. }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Factored State-Abstract Hidden Markov Models for Activity Recognition Using Pervasive Multi-modal Sensors
Tran, D., Phung, D., Bui, H. and Venkatesh, S.. In Int'l Conf. Intelligent Sensors, Sensor Networks and Information Processing, Melbourne, Australia, December 2005. [ | ]
Current probabilistic models for activity recognition do not incorporate much sensory input data due to the problem of state space explosion. In this paper, we propose a model for activity recognition, called the Factored State-Abtract Hidden Markov Model (FS-AHMM) to allow us to integrate many sensors for improving recognition performance. The proposed FS-AHMM is an extension of the Abstract Hidden Markov Model which applies the concept of factored state representations to compactly represent the state transitions. The parameters of the FS-AHMM are estimated using the EM algorithm from the data acquired through multiple multi-modal sensors and cameras. The model is evaluated and compared with other exisiting models on real-world data. The results show that the proposed model outperforms other models and that the integrated sensor information helps in recognizing activity more accurately.

@INPROCEEDINGS { tran_phung_bui_venkatesh_issnips05, TITLE = { Factored State-Abstract Hidden Markov Models for Activity Recognition Using Pervasive Multi-modal Sensors }, AUTHOR = { Tran, D. and Phung, D. and Bui, H. and Venkatesh, S. }, BOOKTITLE = { Int'l Conf. Intelligent Sensors, Sensor Networks and Information Processing }, YEAR = { 2005 }, ADDRESS = { Melbourne, Australia }, MONTH = { December }, ABSTRACT = { Current probabilistic models for activity recognition do not incorporate much sensory input data due to the problem of state space explosion. In this paper, we propose a model for activity recognition, called the Factored State-Abtract Hidden Markov Model (FS-AHMM) to allow us to integrate many sensors for improving recognition performance. The proposed FS-AHMM is an extension of the Abstract Hidden Markov Model which applies the concept of factored state representations to compactly represent the state transitions. The parameters of the FS-AHMM are estimated using the EM algorithm from the data acquired through multiple multi-modal sensors and cameras. The model is evaluated and compared with other exisiting models on real-world data. The results show that the proposed model outperforms other models and that the integrated sensor information helps in recognizing activity more accurately. }, MODIFIED = { 2005-09-19 }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Hierarchical Hidden Markov Models with General State Hierarchy
Bui, H., Phung, D. and Venkatesh, S.. In Procs. of the National Conference on Artificial Intelligence (AAAI), pages 324-329, San Jose, California, USA, 2004. [ | ]
The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition.

@INPROCEEDINGS { bui_phung_venkatesh_aaai04, TITLE = { Hierarchical Hidden Markov Models with General State Hierarchy }, AUTHOR = { Bui, H. and Phung, D. and Venkatesh, S. }, BOOKTITLE = { Procs. of the National Conference on Artificial Intelligence (AAAI) }, YEAR = { 2004 }, ADDRESS = { San Jose, California, USA }, EDITOR = { McGuinness, Deborah L. and Ferguson, George }, PAGES = { 324--329 }, PUBLISHER = { {AAAI} Press / The {MIT} Press }, ABSTRACT = { The hierarchical hidden Markov model (HHMM) is an extension of the hidden Markov model to include a hierarchy of the hidden states. This form of hierarchical modeling has been found useful in applications such as handwritten recognition, behavior recognition, video indexing, and text retrieval. Nevertheless, the state hierarchy in the original HHMM is restricted to a tree structure. This prohibits two different states from having the same child, and thus does not allow for sharing of common substructures in the model. In this paper, we present a general HHMM in which the state hierarchy can be a lattice allowing arbitrary sharing of substructures. Furthermore, we provide a method for numerical scaling to avoid underflow, an important issue in dealing with long observation sequences. We demonstrate the working of our method in a simulated environment where a hierarchical behavioral model is automatically learned and later used for recognition. }, GROUP = { Statistics, Hierarchical Hidden Markov Models (HMM,HHMM) }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Content Structure Discovery in Educational Videos with Shared Structures in the Hierarchical HMMs
Phung, D., Bui, H. and Venkatesh, S.. In Joint Int. Workshop on Syntactic and Structural Pattern Recognition, pages 1155-1163, Lisbon, Portugal, August 18--20 2004. [ | ]
In this paper, we present an application of the hierarchical HMM for structure discovery in educational videos. The HHMM has recently been extended to accommodate the concept of shared structure, ie: a state might multiply inherit from more than one parents. Utilising the expressiveness of this model, we concentrate on a specific class of video . educational videos . in which the hierarchy of semantic units is simpler and clearly defined in terms of topics and its subunits. We model the hierarchy of topical structures by an HHMM and demonstrate the usefulness of the model in detecting topic transitions.

@INPROCEEDINGS { phung_bui_venkatesh_sspr04, TITLE = { Content Structure Discovery in Educational Videos with Shared Structures in the Hierarchical {HMM}s }, AUTHOR = { Phung, D. and Bui, H. and Venkatesh, S. }, BOOKTITLE = { Joint Int. Workshop on Syntactic and Structural Pattern Recognition }, YEAR = { 2004 }, ADDRESS = { Lisbon, Portugal }, MONTH = { August 18--20 }, PAGES = { 1155--1163 }, ABSTRACT = { In this paper, we present an application of the hierarchical HMM for structure discovery in educational videos. The HHMM has recently been extended to accommodate the concept of shared structure, ie: a state might multiply inherit from more than one parents. Utilising the expressiveness of this model, we concentrate on a specific class of video . educational videos . in which the hierarchy of semantic units is simpler and clearly defined in terms of topics and its subunits. We model the hierarchy of topical structures by an HHMM and demonstrate the usefulness of the model in detecting topic transitions. }, AUTHORURL = { (Also available in {\em Lecture Notes in Computer Science: Advanced in Statistical, Structural and Syntactical Pattern Recognition}, Vol. 3138, p. 1155 Springer-Verlag) }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Automatically Learning Structural Units in Educational Videos Using the Hierarchical HMMs
Phung, D., Venkatesh, S. and Bui, H.. In International Conference on Image Processing (ICIP), Singapore, 2004. [ | ]
In this paper we present a coherent approach using the hierarchical HMM with shared structures to extract the structural units that form the building blocks of an education/training video. Rather than using hand-crafted approaches to define the structural units, we use the data from nine training videos to learn the parameters of the HHMM, and thus naturally extract the hierarchy. We then study this hierarchy and examine the nature of the structure at different levels of abstraction. Since the observable is continuous, we also show how to extend the parameter learning in the HHMM to deal with continuous observations.

@INPROCEEDINGS { phung_venkatesh_bui_icip04, TITLE = { Automatically Learning Structural Units in Educational Videos Using the Hierarchical {HMM}s }, AUTHOR = { Phung, D. and Venkatesh, S. and Bui, H. }, BOOKTITLE = { International Conference on Image Processing (ICIP) }, YEAR = { 2004 }, ADDRESS = { Singapore }, ABSTRACT = { In this paper we present a coherent approach using the hierarchical HMM with shared structures to extract the structural units that form the building blocks of an education/training video. Rather than using hand-crafted approaches to define the structural units, we use the data from nine training videos to learn the parameters of the HHMM, and thus naturally extract the hierarchy. We then study this hierarchy and examine the nature of the structure at different levels of abstraction. Since the observable is continuous, we also show how to extend the parameter learning in the HHMM to deal with continuous observations. }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Hierarchical Topic Segmentation in Instructional Films Based on Cinematic Expressive Functions
Phung, D., Venkatesh, S. and Dorai, C.. In ACM International Conference on Multimedia, pages 287-290, Berkeley, USA, 2-8 November 2003. [ | ]
In this paper, we propose a novel solution for segmenting an instructional video into hierarchical topical sections. Incorporating the knowledge of education-oriented Film theory with our previous study of expressive functions namely the content density and the thematic functions, we develop an algorithm to effectively structuralize an instructional video into a two-tiered hierarchy of topical sections at the main and sub-topic levels. Our experimental results on a set of ten industrial instructional videos demonstrate the validity of the detection scheme.

@INPROCEEDINGS { phung_venkatesh_dorai_acmmm03, TITLE = { Hierarchical Topic Segmentation in Instructional Films Based on Cinematic Expressive Functions }, AUTHOR = { Phung, D. and Venkatesh, S. and Dorai, C. }, BOOKTITLE = { ACM International Conference on Multimedia }, YEAR = { 2003 }, ADDRESS = { Berkeley, USA }, MONTH = { 2-8 November }, PAGES = { 287--290 }, ABSTRACT = { In this paper, we propose a novel solution for segmenting an instructional video into hierarchical topical sections. Incorporating the knowledge of education-oriented Film theory with our previous study of expressive functions namely the content density and the thematic functions, we develop an algorithm to effectively structuralize an instructional video into a two-tiered hierarchy of topical sections at the main and sub-topic levels. Our experimental results on a set of ten industrial instructional videos demonstrate the validity of the detection scheme. }, GROUP = { Video, CMA, Topic Detection, Expressive Functions }, KEYWORDS = { Instructional lms, topic detection/segmentation, media aesthetics, cinematic expressive functions, narrative structure }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

On Extraction of Thematic and Dramatic Functions in Educational Films
Phung, D., Venkatesh, S. and Dorai, C.. In {IEEE} International Conference on Multimedia and Expo, pages 449-452, Baltimore, New York, USA, 6-9 July 2003. [ | ]
In this paper, we propose novel computational models for the extraction of high level expressive constructs related to, namely {\em thematic} and {\em dramatic} functions of the content shown in educational and training videos. Drawing on the existing knowledge of film theory, and media production rules and conventions used by the filmmakers, we hypothesize key aesthetic elements contributing to convey these functions of the content. Computational models to extract them are then formulated and their performance evaluated on a set of ten educational and training videos is presented.

@INPROCEEDINGS { phung_venkatesh_dorai_icme03, TITLE = { On Extraction of Thematic and Dramatic Functions in Educational Films }, AUTHOR = { Phung, D. and Venkatesh, S. and Dorai, C. }, BOOKTITLE = { {IEEE} International Conference on Multimedia and Expo }, YEAR = { 2003 }, ADDRESS = { Baltimore, New York, USA }, MONTH = { 6-9 July }, PAGES = { 449--452 }, ABSTRACT = { In this paper, we propose novel computational models for the extraction of high level expressive constructs related to, namely {\em thematic} and {\em dramatic} functions of the content shown in educational and training videos. Drawing on the existing knowledge of film theory, and media production rules and conventions used by the filmmakers, we hypothesize key aesthetic elements contributing to convey these functions of the content. Computational models to extract them are then formulated and their performance evaluated on a set of ten educational and training videos is presented. }, GROUP = { Video, CMA, Educational/Training Films }, INDEX = { dp 18 }, KEYWORDS = { Video, CMA, Educational/Training Films }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

High Level Segmentation of Instructional Videos Based on the Content Density Function
Phung, D., Venkatesh, S. and Dorai, C.. In ACM International Conference on Multimedia (ACM-MM), pages 295-298, Juan Les Pins, France, 1-6 December 2002. [ | ]
Automatically partitioning instructional videos into topic sections is a challenging problem in e-learning environments for efficient content management and cataloging. This paper addresses this problem by proposing a novel density function to delineate sections underscored by changes in topics in instructional and training videos. The content density function draws guidance from the observation that topic boundaries coincide with the ebb and flow of the density of content shown in these videos. Based on this function, we propose two methods for high-level segmentation by determining topic boundaries. We study the performance of the two methods on eight training videos, and our experimental results demonstrate the effectiveness and robustness of the two proposed high-level segmentation algorithms for learning media.

@INPROCEEDINGS { phung_venkatesh_dorai_acmmm02, TITLE = { High Level Segmentation of Instructional Videos Based on the Content Density Function }, AUTHOR = { Phung, D. and Venkatesh, S. and Dorai, C. }, BOOKTITLE = { ACM International Conference on Multimedia (ACM-MM) }, YEAR = { 2002 }, ADDRESS = { Juan Les Pins, France }, MONTH = { 1-6 December }, PAGES = { 295--298 }, ABSTRACT = { Automatically partitioning instructional videos into topic sections is a challenging problem in e-learning environments for efficient content management and cataloging. This paper addresses this problem by proposing a novel density function to delineate sections underscored by changes in topics in instructional and training videos. The content density function draws guidance from the observation that topic boundaries coincide with the ebb and flow of the density of content shown in these videos. Based on this function, we propose two methods for high-level segmentation by determining topic boundaries. We study the performance of the two methods on eight training videos, and our experimental results demonstrate the effectiveness and robustness of the two proposed high-level segmentation algorithms for learning media. }, GROUP = { Video, CMA, Segmentation }, INDEX = { dp 17 }, KEYWORDS = { Computational Media Aesthetics, CMA, content management, content density function, educational/training films/videos }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Narrative Structure Analysis with Education and Training Videos for E-Learning
Phung, D., Dorai, C. and Venkatesh, S.. In International Conference on Pattern Recognition (ICPR), pages 835-839, Quebec, Canada, 11-15 August 2002. [ | ]
This paper deals with the problem of structuralizing education and training videos for high-level semantics extraction and nonlinear media presentation in e-learning applications. Drawing guidance from production knowledge in instructional media,we propose six main narrative structures employed in education and training videos for both motivation and demonstration during learning and practical training. We devise a powerful audiovisual feature set, accompanied by a hierarchical decision tree-based classification system to determine and discriminate between these structures. Based on a two-tiered hierarchical model, we demonstrate that we can achieve an accuracy of 84.7\% on a comprehensive set of education and training video data.

@INPROCEEDINGS { phung_dorai_venkatesh_icpr02, TITLE = { Narrative Structure Analysis with Education and Training Videos for {E}-Learning }, AUTHOR = { Phung, D. and Dorai, C. and Venkatesh, S. }, BOOKTITLE = { International Conference on Pattern Recognition (ICPR) }, YEAR = { 2002 }, ADDRESS = { Quebec, Canada }, MONTH = { 11-15 August }, PAGES = { 835--839 }, ABSTRACT = { This paper deals with the problem of structuralizing education and training videos for high-level semantics extraction and nonlinear media presentation in e-learning applications. Drawing guidance from production knowledge in instructional media,we propose six main narrative structures employed in education and training videos for both motivation and demonstration during learning and practical training. We devise a powerful audiovisual feature set, accompanied by a hierarchical decision tree-based classification system to determine and discriminate between these structures. Based on a two-tiered hierarchical model, we demonstrate that we can achieve an accuracy of 84.7\% on a comprehensive set of education and training video data. }, GROUP = { Video, CMA, Narrative Structure, E-learning }, INDEX = { dp 15 }, KEYWORDS = { Narrative Structure Analysis, E-learning, Computational Media Aesthetics, CMA, content management, content density function, educational films }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

Video Genre Categorization Using Audio Wavelet Coefficients
Phung, D., Dorai, C. and Venkatesh, S.. In The Fifth Asian Conference on Computer Vision, pages 69-74, Melbourne, Australia, 23-25 January 2002. [ | ]
In this paper, we investigate the use of a wavelet transform-based analysis of audio tracks accompanying videos for the problem of automatic program genre detection. We compare the classification performance based on wavelet-based audio features to that using conventional features derived from Fourier and time analysis for the task of discriminating TV programs such as news, commercials,music shows, concerts, motor racing games, and animated cartoons. Three different classifiers namely the Decision Trees, SVMs, and K-Nearest Neighbours are studied to analyse the reliability of the performance of our wavelet features based approach. Further, we investigate the issue of an appropriate duration of an audio clip to be analyzed for this automatic genre determination. Our experimental results show that features derived from the wavelet transform of the audio signal can very well separate the six video genres studied. It is also found that there is no significant difference in performance with varying audio clip durations across the classifiers.

@INPROCEEDINGS { phung_dorai_venkatesh_accv02, TITLE = { Video Genre Categorization Using Audio Wavelet Coefficients }, AUTHOR = { Phung, D. and Dorai, C. and Venkatesh, S. }, BOOKTITLE = { The Fifth Asian Conference on Computer Vision }, YEAR = { 2002 }, ADDRESS = { Melbourne, Australia }, MONTH = { 23-25 January }, PAGES = { 69--74 }, ABSTRACT = { In this paper, we investigate the use of a wavelet transform-based analysis of audio tracks accompanying videos for the problem of automatic program genre detection. We compare the classification performance based on wavelet-based audio features to that using conventional features derived from Fourier and time analysis for the task of discriminating TV programs such as news, commercials,music shows, concerts, motor racing games, and animated cartoons. Three different classifiers namely the Decision Trees, SVMs, and K-Nearest Neighbours are studied to analyse the reliability of the performance of our wavelet features based approach. Further, we investigate the issue of an appropriate duration of an audio clip to be analyzed for this automatic genre determination. Our experimental results show that features derived from the wavelet transform of the audio signal can very well separate the six video genres studied. It is also found that there is no significant difference in performance with varying audio clip durations across the classifiers. }, GROUP = { Video, Audio, Sound, Wavelets }, INDEX = { dp 14 }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }

An Investigation into Audio for Content Annotation
Phung, Dinh. Master's thesis, Honours thesis, Department of Computing, Curtin University of Technology, 2001. [ | ]
Content-based segmentation and classification in audio and its use for video genre identification are two areas of investigation in this dissertation. Segmentation and classification of audio is the first step required for developing audio database management systems supporting automatic content indexing and retrieval. The primary purpose is to segment an audio stream into meaningful units to be indexed. Video genre classification, on the other hand, enables efficient cataloging and retrieval in large video databases. We propose an improved algorithm for audio boundary detection. Several important aspects of audio segmentation are investigated, including the issue of clip duration and its effect on classification accurarcy. Classification is attempted for a relatively new set of audio classes consisting of pure speech, speech with music background, instrumental music, lyric, and audio in sport programs. We also propose a smoothing scheme to improve the classification results. With respect to video genre identification, while most previous research efforts have focused on visual information, this dissertation aims to use only information from audio tracks accompanied in video data for recognition. We extract a rich set of features from audio signals and used them for video genre identification. Experimental results show the robustness of the features to reliably distinguish six types of video genre: news, commercials, concerts, shows, motor racing programs and annimated cartoons.

@MASTERSTHESIS { phung_honours01_investigation, TITLE = { An Investigation into Audio for Content Annotation }, AUTHOR = { Phung, Dinh }, SCHOOL = { Honours thesis, Department of Computing, Curtin University of Technology }, YEAR = { 2001 }, TYPE = { honours }, ABSTRACT = { Content-based segmentation and classification in audio and its use for video genre identification are two areas of investigation in this dissertation. Segmentation and classification of audio is the first step required for developing audio database management systems supporting automatic content indexing and retrieval. The primary purpose is to segment an audio stream into meaningful units to be indexed. Video genre classification, on the other hand, enables efficient cataloging and retrieval in large video databases. We propose an improved algorithm for audio boundary detection. Several important aspects of audio segmentation are investigated, including the issue of clip duration and its effect on classification accurarcy. Classification is attempted for a relatively new set of audio classes consisting of pure speech, speech with music background, instrumental music, lyric, and audio in sport programs. We also propose a smoothing scheme to improve the classification results. With respect to video genre identification, while most previous research efforts have focused on visual information, this dissertation aims to use only information from audio tracks accompanied in video data for recognition. We extract a rich set of features from audio signals and used them for video genre identification. Experimental results show the robustness of the features to reliably distinguish six types of video genre: news, commercials, concerts, shows, motor racing programs and annimated cartoons. }, OWNER = { 184698H }, TIMESTAMP = { 2010.08.11 }, }