Causal discovery aims to infer cause-and-effect relationships from observational data, a crucial step beyond statistical correlation. A prominent method for this is the Linear Non-Gaussian Acyclic Model (LiNGAM), which can uniquely identify the causal structure by assuming linear relationships and non-Gaussian noise. LiNGAM-based algorithms typically depend on two key components: a search algorithm to determine the causal ordering of variables, and an independence measure to guide the search. Recent work, LiNGAM-MMI, proposed that replacing the simple greedy search with a global, shortest-path search led to superior performance, particularly when unmeasured common causes (confounders) are present. However, the claim was based on experiments that also modified the independence measure from the original baseline, making it difficult to isolate the source of the improvement.
To address this, we perform extensive experiments to test whether the search algorithm was truly the driver of performance, hypothesizing that the choice of independence measure is the dominant factor. In particular, we introduce a unified beam search framework that serves as both an analytical tool to disentangle these components and a practical algorithm with a scalable performance-complexity trade-off. Our simulations comparing the kNN-based Copula Entropy (CopEnt) with the Pairwise Likelihood Ratio (PLR) establish that the independence measure is the dominant factor, to the extent that a simple greedy search with a more effective measure, PLR, outperforms a global search with a less effective one, CopEnt, on general graphs. Furthermore, we find no evidence that the benefit of a more complex search algorithm is specific to handling unmeasured confounders, suggesting it instead serves to overcome general estimation errors arising from finite data. Finally, we demonstrate that the strong performance with CopEnt reported in the previous work was an artifact of a simplistic experimental setup, as its performance advantage is reversed on more realistic and complex structures, including Erdős-Rényi (ER) and Scale-Free (SF) networks.
@article{ong2026lingammmi,author={Ong, Hans Jarett J. and Lim, Brian Godwin S. and Tan, Renzo Roel P. and Ikeda, Kazushi},title={Measure Over Search: A Critical Re-evaluation of the Roles of Search and Independence Measure in {LiNGAM}-based Causal Discovery},journal={IEICE Trans. Information and Systems},year={2026},}
IEEE Access
A Bayesian Monte Carlo Variational Inference Estimation Procedure for Dynamic Factor Models on Stock Price Returns
Benedict Ryan
Tiu, Dominic
Dayta, Hans Jarett
Ong, and
2 more authors
Dynamic factor models (DFMs) provide a framework for distilling high-dimensional time series data into a small set of unobserved latent factors. Traditional statistical methods for estimating DFMs are often computationally intensive and can be inflexible when adapting to non-linear model extensions. In this work, we propose a modular Bayesian Monte Carlo variational inference (MCVI) estimation procedure for DFMs designed to prioritize flexibility and extensibility. We demonstrate that in the context of the Philippine Stock Exchange, the estimated latent factors and loadings qualitatively align with findings obtained through conventional methods. To illustrate the modularity of the framework, we extend the DFM to incorporate a diagonal BEKK-GARCH structure, which reveals non-trivial volatility clustering in the latent factors. While the adoption of a mean-field variational approximation entails certain trade-offs in posterior accuracy and computational overhead relative to specialized linear estimators, the decoupling of the model specification from the inference engine allows for the rapid integration of complex layers without requiring model-specific re-derivations. Overall, this work establishes an extensible Bayesian framework that facilitates more nuanced investigations into the dynamic and heteroskedastic nature of systematic risk in financial markets.
@article{tiu2026bayesianDFM,author={Tiu, Benedict Ryan and Dayta, Dominic and Ong, Hans Jarett and Lim, Brian Godwin and Ikeda, Kazushi},title={A Bayesian Monte Carlo Variational Inference Estimation Procedure for Dynamic Factor Models on Stock Price Returns},journal={IEEE Access},year={2026},}
IEEE Access
Towards Unsupervised Causal Representation Learning via Latent Additive Noise Model Causal Autoencoders
Hans Jarett J.
Ong, Brian Godwin S.
Lim, Dominic
Dayta, and
2 more authors
Unsupervised representation learning seeks to recover latent generative factors, yet standard methods relying on statistical independence often fail to capture causal dependencies. A central challenge is identifiability: as established in disentangled representation learning and nonlinear ICA literature, disentangling causal variables from observational data is impossible without supervision, auxiliary signals, or strong inductive biases. In this work, we propose the Latent Additive Noise Model Causal Autoencoder (LANCA) to operationalize the Additive Noise Model (ANM) as a strong inductive bias for unsupervised discovery. Theoretically, we prove that while the ANM constraint does not guarantee unique identifiability in the general mixing case, it resolves component-wise indeterminacy by restricting the admissible transformations from arbitrary diffeomorphisms to the affine class. Methodologically, arguing that the stochastic encoding inherent to VAEs obscures the structural residuals required for latent causal discovery, LANCA employs a deterministic Wasserstein Auto-Encoder (WAE) coupled with a differentiable ANM Layer. This architecture transforms residual independence from a passive assumption into an explicit optimization objective. Empirically, LANCA outperforms state-of-the-art baselines on synthetic physics benchmarks (Pendulum, Flow), and on photorealistic environments (CANDLE), where it demonstrates superior robustness to spurious correlations arising from complex background scenes.
@inproceedings{ong2025lanca,title={Towards Unsupervised Causal Representation Learning via Latent Additive Noise Model Causal Autoencoders},author={Ong, Hans Jarett J. and Lim, Brian Godwin S. and Dayta, Dominic and Tan, Renzo Roel P. and Ikeda, Kazushi},year={2026},booktitle={IEEE Access},note={Under Review},selected=true}
UAI
MetaCaDI: A Meta-Learning Framework for Causal Discovery from Multiple Environments with Unknown Interventions
Hans Jarett J.
Ong, Yoichi
Chikahara, and Tomoharu
Iwata
In Forty-Second Annual Conference on Uncertainty in Artificial Intelligence (UAI), 2026
Uncovering the underlying causal mechanisms of complex real-world systems remains a significant challenge, as these systems often entail high data collection costs and involve unknown interventions. We introduce MetaCaDI, the first framework to cast the joint discovery of a causal graph and unknown interventions as a meta-learning problem. MetaCaDI is a Bayesian framework that learns a shared causal graph structure across multiple experiments and is optimized to rapidly adapt to new, few-shot intervention target prediction tasks. A key innovation is our model’s analytical adaptation, which uses a closed-form solution to bypass expensive and potentially unstable gradient-based bilevel optimization. Extensive experiments on synthetic and complex gene expression data demonstrate that MetaCaDI significantly outperforms state-of-the-art methods. It excels at both causal graph recovery and identifying intervention targets from as few as 10 data instances, proving its robustness in data-scarce scenarios.
@inproceedings{ong2026metacadi,title={{MetaCaDI}: A Meta-Learning Framework for Causal Discovery from Multiple Environments with Unknown Interventions},author={Ong, Hans Jarett J. and Chikahara, Yoichi and Iwata, Tomoharu},year={2026},booktitle={Forty-Second Annual Conference on Uncertainty in Artificial Intelligence (UAI)},note={Under Review},selected=true}
2025
AROB
Causal discovery in Additive Noise Models using beam search
Hans Jarett J.
Ong, Brian Godwin S.
Lim, Renzo Roel P.
Tan, and
1 more author
Causal discovery from observational data is a fundamental challenge. Greedy search algorithms like Regression with Subsequent Independence Test (RESIT), commonly used for learning Additive Noise Models (ANMs), are susceptible to making irreversible errors, especially in high-variance contexts. Such settings can be caused by unmeasured confounders or by high statistical noise from finite samples. To address this, we introduce a novel generalization of RESIT that replaces its local, greedy search with a more robust beam search, framing the task as a path search on a state-space graph. Through extensive simulation experiments, we demonstrate that structural accuracy, measured by Structural Hamming Distance (SHD) and Structural Intervention Distance (SID), consistently improves as the beam width (w) increases. Crucially, we also show that this performance gain comes at a manageable, approximately linear increase in computational cost relative to w. Furthermore, our analysis across different sample sizes shows these gains are most statistically significant in intermediate regimes (n=250,500). This suggests that at these sample sizes, the statistical noise is high enough to mislead the greedy search into a suboptimal ordering, an error our wider beam search corrects, while performance converges at large sample sizes (n=1000). Our framework provides a practical, tunable algorithm that bridges the gap between fast but brittle local search methods and computationally infeasible global searches, thereby enhancing the reliability of causal discovery in complex, high-variance settings where such local errors are common.
@article{ong2025resitmmi,author={Ong, Hans Jarett J. and Lim, Brian Godwin S. and Tan, Renzo Roel P. and Ikeda, Kazushi},title={Causal discovery in Additive Noise Models using beam search},journal={Artificial Life and Robotics},year={2025},pages={317-326},volume={31},selected=true}
ICONIP
A Compression-Based Dependence Measure for Causal Discovery by Additive Noise Models
Hans Jarett J.
Ong, Brian Godwin S.
Lim, Benedict Ryan C.
Tiu, and
2 more authors
In this work, we introduce a novel compression-based dependence measure (CDM) for causal discovery. Our proposed measure leverages data compression to quantify dependence, offering a new approach that is effective even with small data sizes. Through extensive simulations with general additive noise models, causal additive models, and linear non-Gaussian acyclic models, we demonstrate the relative superiority of CDM over existing methods. Additionally, we validate our approach using the cause-effect pairs benchmark dataset, where CDM shows comparable accuracy across various sample sizes. To close, we discuss the sensitivity of CDM to data scales, an issue shared by other causal discovery methods. Despite this, CDM presents a promising way to take advantage of data compression for causal discovery.
@inproceedings{ong2025iconip,title={A Compression-Based Dependence Measure for Causal Discovery by Additive Noise Models},author={Ong, Hans Jarett J. and Lim, Brian Godwin S. and Tiu, Benedict Ryan C. and Tan, Renzo Roel P. and Ikeda, Kazushi},year={2025},booktitle={Neural Information Processing},pages={61-75},}
IJCNN
FinSIR: Financial SIR-GCN for Market-Aware Satock Recommendation
Brian Godwin
Lim, Jiahong
Liu, Hans Jarett
Ong, and
4 more authors
In 2025 International Joint Conference on Neural Networks (IJCNN), 2025
Existing works on stock price prediction have largely treated stocks in a market independently of one another. Nevertheless, recent advances in graph neural networks (GNNs) have enabled the efficient processing of diverse stock relations. This paper introduces the Financial SIR-GCN (FinSIR) for market-aware stock price prediction and recommendation. By modeling stock markets as spatio-temporal graphs, FinSIR addresses the key architectural limitation of existing graph-based models. Notably, the proposed model integrates the soft-isomorphic relational graph convolution network (SIR-GCN) with the "sandwich" structure employed in GNN for time series analysis (GNN4TS) to jointly process the two key dimensions of stock market graphs and to contextualize hidden states with both spatial and temporal stock relations. Backtesting results on the New York Stock Exchange (NYSE) and the National Association of Securities Dealers Automatic Quotation System (NASDAQ) reveal FinSIR consistently achieving up to 65% and 36% larger cumulative investment returns, respectively, compared to baseline models. Additionally, an ablation study further highlights the contribution of each FinSIR module in providing better investment recommendations. Overall, the paper incorporates recent advances in GNN and GNN4TS to provide a new perspective on graph-based solutions for improved stock price prediction and recommendation.
@inproceedings{lim2025finsir,title={{FinSIR}: Financial {SIR-GCN} for Market-Aware Satock Recommendation},author={Lim, Brian Godwin and Liu, Jiahong and Ong, Hans Jarett and Chan, Jan Adrian and Tan, Renzo Roel and King, Irwin and Ikeda, Kazushi},year={2025},booktitle={2025 International Joint Conference on Neural Networks (IJCNN)},note={In Press},}
2024
ICACSE
Dynamic Principal Component Analysis for the Construction of High-Frequency Economic Indicators
Brian Godwin
Lim, Hans Jarett
Ong, Renzo Roel
Tan, and
1 more author
In Proceedings of the 4th International Conference on Advances in Computational Science and Engineering, 2024
Recent progress in data analysis and machine learning has enabled the efficient processing of large data; however, the public sector has yet to fully adopt these advancements. The study investigates the application of dynamic principal component analysis in offering real-time insights into various facets of an economy, potentially aiding in the informed decision-making of policymakers. In brief, dynamic principal component analysis generates dynamic principal components representing latent factors that account for the autocovariance in time series data. In examining daily data from the Philippine stock exchange, Philippine peso exchange rates, and Philippine peso to United States dollar forward rates, results demonstrate the effectiveness of the first three dynamic principal components as high-frequency indicators for business and investment conditions, economic performance, and economic outlook, respectively. Moreover, an application of the isolation forest anomaly detection algorithm validates the sensitivity of the constructed indicators to systematic economic shocks, which identified events such as the taper tantrum of 2013 and the 2020 lockdown due to the novel coronavirus pandemic, among others. Overall, the practical applicability of the proposed methodology suggests potential extensions incorporating nontraditional data sources for more comprehensive economic indicators.
@inproceedings{lim2024dpca,title={Dynamic Principal Component Analysis for the Construction of High-Frequency Economic Indicators},author={Lim, Brian Godwin and Ong, Hans Jarett and Tan, Renzo Roel and Ikeda, Kazushi},year={2024},booktitle={Proceedings of the 4th International Conference on Advances in Computational Science and Engineering},pages={645-663},}
arXiv
Redefining the Shortest Path Problem Formulation of the Linear Non-Gaussian Acyclic Model: Pairwise Likelihood Ratios, Prior Knowledge, and Path Enumeration
Effective causal discovery is essential for learning the causal graph from observational data. The linear non-Gaussian acyclic model (LiNGAM) operates under the assumption of a linear data generating process with non-Gaussian noise in determining the causal graph. Its assumption of unmeasured confounders being absent, however, poses practical limitations. In response, empirical research has shown that the reformulation of LiNGAM as a shortest path problem (LiNGAM-SPP) addresses this limitation. Within LiNGAM-SPP, mutual information is chosen to serve as the measure of independence. A challenge is introduced - parameter tuning is now needed due to its reliance on kNN mutual information estimators. The paper proposes a threefold enhancement to the LiNGAM-SPP framework. First, the need for parameter tuning is eliminated by using the pairwise likelihood ratio in lieu of kNN-based mutual information. This substitution is validated on a general data generating process and benchmark real-world data sets, outperforming existing methods especially when given a larger set of features. The incorporation of prior knowledge is then enabled by a node-skipping strategy implemented on the graph representation of all causal orderings to eliminate violations based on the provided input of relative orderings. Flexibility relative to existing approaches is achieved. Last among the three enhancements is the utilization of the distribution of paths in the graph representation of all causal orderings. From this, crucial properties of the true causal graph such as the presence of unmeasured confounders and sparsity may be inferred. To some extent, the expected performance of the causal discovery algorithm may be predicted. The refinements above advance the practicality and performance of LiNGAM-SPP, showcasing the potential of graph-search-based methodologies in advancing causal discovery.
@article{ong2024redefining,title={Redefining the Shortest Path Problem Formulation of the Linear Non-Gaussian Acyclic Model: Pairwise Likelihood Ratios, Prior Knowledge, and Path Enumeration},author={Ong, Hans Jarett J and Lim, Brian Godwin S},journal={arXiv preprint arXiv:2404.11922},year={2024},}
2019
Thesis
Using Mahalanobis Distance to Classify Aerosol in Southeast Asia based on AERONET-Retrieved Optical Properties
Aerosol types over Southeast Asia (SEA) are determined from Aerosol Robotic Network (AERONET) derived aerosol optical properties for 25 sites using Mahalanobis method. Angstom exponent (AE), single scattering albedo (SSA), and real refractive index (n) are used in a three-dimensional specified clustering method that classified aerosol into 7 classes, namely: biomass burning white smoke (BB-W), polluted dust (PD), urban industrial developing economy (UI-D), urban industrial (UI), biomass burning dark smoke (BB-D), mineral dust (MD), and marine aerosols. The results show that most of the 25 sites are dominated by PD and UI-D. Specifically, sites from Indonesia, Singapore, and a part of Malaysia are dominated by reflective aerosols like UI and UI-D; sites from Thailand, Philippines, Malaysia, and southern Vietnam are dominated by more absorbing aerosols like PD and UI-D; sites from northern Vietnam and Taiwan are dominated by coarse aerosol like PD and UI-D.
@thesis{Ong2019,author={Ong, Hans Jarett J.},title={Using Mahalanobis Distance to Classify Aerosol in Southeast Asia based on AERONET-Retrieved Optical Properties},school={Ateneo de Manila University},year={2019},type={Bachelor's Thesis},advisor={Nofel D. Lagrosas, Ph.D. and Clint Dominic G. Bennett},committee={Joel T. Maquiling, Ph.D. and Patricio P. Dailisan},note={Undergraduate Thesis},}
2018
JpGU
Aerosol Types from 25 Southeast Asian AERONET Sites Obtained Using Specified Clustering and Mahalanobis Distance
Hans Jarett J.
Ong, Nofel
Lagrosas, Uy
Sherdon, and
21 more authors
This study aims to identify aerosol types over 25 southeast Asian sites using Aerosol Robotic Network (AERONET) level 2.0 inversion data in a five-dimensional specified classification method. The classification method makes use of the Mahalanobis distance in five dimensions to classify each point of the data to the closest reference cluster. This study relies on the fact that the method is scale-free and takes into account the obliqueness of the clusters. AERONET data from 7 sites is used to define 7 aerosol reference clusters: mineral dust (MD), polluted dust (PD), urban industrial (UI), urban industrial developing (UID), biomass burning white smoke (BBW), biomass burning dark smoke (BBD), and marine aerosols (Russell et al., 2014). These are applied on the following AERONET sites: Thailand (ChiangMaiMetSta, Mukdahan, Omkoi, SilpakornUniv, SongkhlaMetSta, UbonRatchathani); Singapore (Singapore); Vietnam (BacGiang, BacLieu, NGHIADO, NhaTrang); Philippines (ManilaObservatory, NDMarbelUniv); Taiwan (Chiayi, DongshaIsland, EPANCU, Lulin, NCUTaiwan, TaipeiCWB); Malaysia (Kuching, USMPenang); and Indonesia (Bandung, Jambi, Palangkaraya, Pontianak). The results of applying this method to the AERONET data from these sites show that the most dominant aerosol types in the region are PD, UID, and BBW. PD aerosols are characterized by mean Angstrom Exponent (AE) values of 1.19 (±0.238) and mean Single Scattering Albedo (SSA) values of 0.886 (±0.0400). UID aerosols are characterized by mean AE of 1.34 (±0.151) and mean SSA of 0.955 (±0.0249). BBW is characterized by mean AE of 1.87 (±0.144) and mean SSA of 0.925 (±0.0201). This implies that BBW aerosols are finer compared to PD and UID while PD aerosols are more absorbing compared to UID and BBW. The dominance of PD and UID aerosols may be attributed to vehicular emissions (with complete and incomplete combustions). The dominance of BBW in this region may be attributed to open burning of crop residues after harvesting. In this work, the sites where PD is most dominant are BacGiang (75%), BacLieu (63%), Bandung (55%), ChiangMaiMetSta (69%), DongshaIsland (52%), Lulin (38%), ManilaObservatory (77%), Mukdahan (58%), NDMarbelUniv (41%), NGHIADO (50%), NhaTrang (48%), Omkoi (38%), SilpakornUniv (77%), SongkhlaMetSta (69%), TaipeiCWB (54%), and USMPenang (55%). In these sites, PD is generally dominant during the months of January to May, although it is observed to be scattered throughout the year for some sites. The months of January to May are usually considered as dry months in some sites although exact dry months differ for each site. UID is the most dominant aerosol type in Chiayi (66%), EPANCU (50%), Jambi (50%), Kuching (47%), NCUTaiwan (56%), Palangkaraya (67%), Pontianak (48%), and Singapore (60%). BBW is most dominant in UbonRatchathani (38%), but is also found in significant amounts in Jambi (22%), Kuching (24%), Lulin (14%), Mukdahan (15%), NDMarbelUniv (21%), NhaTrang (23%), Omkoi (34%), USMPenang (11%). During September and October when biomass burning is common in the region, traces of BBW and BBD are found in Kuching, Pontianak, Singapore, and Taipei. UID type is commonly observed all throughout the year.
@misc{ong2022aerosol,author={Ong, Hans Jarett J. and Lagrosas, Nofel and Sherdon, Uy and Liz, Cruz and Glenn, Gacal and Jeffrey, Reid and Nguyen, Anh Xuan and Lestari, Puji and Janjai, Serm and Lin, Tang-Huang and Wang, Sheng-Hsiang (Carlo) and Kuo, Chun-Chiang (Ferret) and Chia, Hao-Ping (Eric) and Lin, Neng-Huei (George) and Holben, Brent N. and Mohamad, Maznorian and Mahmud, Mastura and Liew, Soo Chin and Liu, Gin-Rong and Dorado, Susana and Tobis, Victorino and Cortijo, Santo V. Salinas and Lin, Po-Hsiung and Lim, Hwee San},title={Aerosol Types from 25 Southeast Asian {AERONET} Sites Obtained Using Specified Clustering and Mahalanobis Distance},howpublished={Oral Presentation Abstract},year={2018},note={Japan Geoscience Union, Chiba, Japan},}
2016
AGU
Determination of Monthly Aerosol Types in Manila Observatory and Notre Dame of Marbel University from Aerosol Robotic Network (AERONET) measurements
Hans Jarett J.
Ong, Nofel
Lagrosas, Sherdon Niño Yu
Uy, and
4 more authors
This study aims to identify aerosol types in Manila Observatory (MO) and Notre Dame of Marbel University (NDMU) using Aerosol Robotic Network (AERONET) Level 2.0 inversion data and five dimensional specified clustering and Mahalanobis classification. The parameters used are the 440-870 nm extinction Angström exponent (EAE), 440 nm single scattering albedo (SSA), 440-870 nm absorption Angström exponent (AAE), 440 nm real and imaginary refractive indices. Specified clustering makes use of AERONET data from 7 sites to define 7 aerosol classes: mineral dust (MD), polluted dust (PD), urban industrial (UI), urban industrial developing (UID), biomass burning white smoke (BBW), biomass burning dark smoke (BBD), and marine aerosols. This is similar to the classes used by Russell et al, 2014. A data point is classified into a class based on the closest 5-dimensional Mahalanobis distance (Russell et al, 2014 & Hamill et al, 2016). This method is applied to all 173 MO data points from January 2009 to June 2015 and to all 24 NDMU data points from December 2009 to July 2015 to look at monthly and seasonal variations of aerosol types. The MO and NDMU aerosols are predominantly PD ( 77%) and PD & UID ( 75%) respectively (Figs.1a-b); PD is predominant in the months of February to May in MO and February to March in NDMU. PD results from less strict emission and environmental regulations (Catrall 2005). Average SSA values in MO is comparable to the mean SSA for PD ( 0.89). This can be attributed to presence of high absorbing aerosol types, e.g., carbon which is a product of transportation emissions. The second most dominant aerosol type in MO is UID ( 15%), in NDMU it is BBW ( 25%). In Manila, the high sources of PD and UID (fine particles) is generally from vehicular combustion (Oanh, et al 2006). The detection of BBW in MO from April to May can be attributed to the fires which are common in these dry months. In NDMU, BBW source is from biomass burning (smoldering). In this analysis, smoke from biomass burning transported from other Southeast Asian countries are not observed because of low number of inversion data points. However, fine mode AOD values in NDMU from September to October can have values greater than 1 which implies detection of this transported biomass burning smoke.
@misc{Ong:2016,author={Ong, Hans Jarett J. and Lagrosas, Nofel and Uy, Sherdon Niño Yu and Gacal, Glenn Franco Barroso and Dorado, Susana and Tobias, Victorino and Holben, Brent N},title={Determination of Monthly Aerosol Types in Manila Observatory and Notre Dame of Marbel University from Aerosol Robotic Network (AERONET) measurements},howpublished={Oral Presentation Abstract},year={2016},note={AGU Fall Meeting, San Francisco, California, USA},}