publications | Hans Jarett Ong

2024

Preprint

Redefining the Shortest Path Problem Formulation of the Linear Non-Gaussian Acyclic Model: Pairwise Likelihood Ratios, Prior Knowledge, and Path Enumeration

Hans Jarett J Ong, and Brian Godwin S Lim

arXiv preprint arXiv:2404.11922, 2024

Abs

Effective causal discovery is essential for learning the causal graph from observational data. The linear non-Gaussian acyclic model (LiNGAM) operates under the assumption of a linear data generating process with non-Gaussian noise in determining the causal graph. Its assumption of unmeasured confounders being absent, however, poses practical limitations. In response, empirical research has shown that the reformulation of LiNGAM as a shortest path problem (LiNGAM-SPP) addresses this limitation. Within LiNGAM-SPP, mutual information is chosen to serve as the measure of independence. A challenge is introduced - parameter tuning is now needed due to its reliance on kNN mutual information estimators. The paper proposes a threefold enhancement to the LiNGAM-SPP framework. First, the need for parameter tuning is eliminated by using the pairwise likelihood ratio in lieu of kNN-based mutual information. This substitution is validated on a general data generating process and benchmark real-world data sets, outperforming existing methods especially when given a larger set of features. The incorporation of prior knowledge is then enabled by a node-skipping strategy implemented on the graph representation of all causal orderings to eliminate violations based on the provided input of relative orderings. Flexibility relative to existing approaches is achieved. Last among the three enhancements is the utilization of the distribution of paths in the graph representation of all causal orderings. From this, crucial properties of the true causal graph such as the presence of unmeasured confounders and sparsity may be inferred. To some extent, the expected performance of the causal discovery algorithm may be predicted. The refinements above advance the practicality and performance of LiNGAM-SPP, showcasing the potential of graph-search-based methodologies in advancing causal discovery.

2023

Conference
Proceedings

Dynamic Principal Component Analysis for the Construction of High-Frequency Economic Indicators

Brian Godwin Lim, Hans Jarett Ong, Renzo Roel Tan, and 1 more author

In , 2023

Abs

Recent progress in data analysis and machine learning has enabled the efficient processing of large data; however, the public sector has yet to fully adopt these advancements. The study investigates the application of dynamic principal component analysis in offering real-time insights into various facets of an economy, potentially aiding in the informed decision-making of policymakers. In brief, dynamic principal component analysis generates dynamic principal components representing latent factors that account for the autocovariance in time series data. In examining daily data from the Philippine stock exchange, Philippine peso exchange rates, and Philippine peso to United States dollar forward rates, results demonstrate the effectiveness of the first three dynamic principal components as high-frequency indicators for business and investment conditions, economic performance, and economic outlook, respectively. Moreover, an application of the isolation forest anomaly detection algorithm validates the sensitivity of the constructed indicators to systematic economic shocks, which identified events such as the taper tantrum of 2013 and the 2020 lockdown due to the novel coronavirus pandemic, among others. Overall, the practical applicability of the proposed methodology suggests potential extensions incorporating nontraditional data sources for more comprehensive economic indicators.

2019

Using Mahalanobis Distance to Classify Aerosol in Southeast Asia based on AERONET-Retrieved Optical Properties

Hans Jarett J. Ong

2019

Undergraduate Thesis

Abs PDF

Aerosol types over Southeast Asia (SEA) are determined from Aerosol Robotic Network (AERONET) derived aerosol optical properties for 25 sites using Mahalanobis method. Angstom exponent (AE), single scattering albedo (SSA), and real refractive index (n) are used in a three-dimensional specified clustering method that classified aerosol into 7 classes, namely: biomass burning white smoke (BB-W), polluted dust (PD), urban industrial developing economy (UI-D), urban industrial (UI), biomass burning dark smoke (BB-D), mineral dust (MD), and marine aerosols. The results show that most of the 25 sites are dominated by PD and UI-D. Specifically, sites from Indonesia, Singapore, and a part of Malaysia are dominated by reflective aerosols like UI and UI-D; sites from Thailand, Philippines, Malaysia, and southern Vietnam are dominated by more absorbing aerosols like PD and UI-D; sites from northern Vietnam and Taiwan are dominated by coarse aerosol like PD and UI-D.

2018

Oral
Presentation

Aerosol Types from 25 Southeast Asian AERONET Sites Obtained Using Specified Clustering and Mahalanobis Distance

Hans Jarett J. Ong, Nofel Lagrosas, Uy Sherdon, and 21 more authors

2018

Japan Geoscience Union, Chiba, Japan

Abs

This study aims to identify aerosol types over 25 southeast Asian sites using Aerosol Robotic Network (AERONET) level 2.0 inversion data in a five-dimensional specified classification method. The classification method makes use of the Mahalanobis distance in five dimensions to classify each point of the data to the closest reference cluster. This study relies on the fact that the method is scale-free and takes into account the obliqueness of the clusters. AERONET data from 7 sites is used to define 7 aerosol reference clusters: mineral dust (MD), polluted dust (PD), urban industrial (UI), urban industrial developing (UID), biomass burning white smoke (BBW), biomass burning dark smoke (BBD), and marine aerosols (Russell et al., 2014). These are applied on the following AERONET sites: Thailand (ChiangMaiMetSta, Mukdahan, Omkoi, SilpakornUniv, SongkhlaMetSta, UbonRatchathani); Singapore (Singapore); Vietnam (BacGiang, BacLieu, NGHIADO, NhaTrang); Philippines (ManilaObservatory, NDMarbelUniv); Taiwan (Chiayi, DongshaIsland, EPANCU, Lulin, NCUTaiwan, TaipeiCWB); Malaysia (Kuching, USMPenang); and Indonesia (Bandung, Jambi, Palangkaraya, Pontianak). The results of applying this method to the AERONET data from these sites show that the most dominant aerosol types in the region are PD, UID, and BBW. PD aerosols are characterized by mean Angstrom Exponent (AE) values of 1.19 (±0.238) and mean Single Scattering Albedo (SSA) values of 0.886 (±0.0400). UID aerosols are characterized by mean AE of 1.34 (±0.151) and mean SSA of 0.955 (±0.0249). BBW is characterized by mean AE of 1.87 (±0.144) and mean SSA of 0.925 (±0.0201). This implies that BBW aerosols are finer compared to PD and UID while PD aerosols are more absorbing compared to UID and BBW. The dominance of PD and UID aerosols may be attributed to vehicular emissions (with complete and incomplete combustions). The dominance of BBW in this region may be attributed to open burning of crop residues after harvesting. In this work, the sites where PD is most dominant are BacGiang (75%), BacLieu (63%), Bandung (55%), ChiangMaiMetSta (69%), DongshaIsland (52%), Lulin (38%), ManilaObservatory (77%), Mukdahan (58%), NDMarbelUniv (41%), NGHIADO (50%), NhaTrang (48%), Omkoi (38%), SilpakornUniv (77%), SongkhlaMetSta (69%), TaipeiCWB (54%), and USMPenang (55%). In these sites, PD is generally dominant during the months of January to May, although it is observed to be scattered throughout the year for some sites. The months of January to May are usually considered as dry months in some sites although exact dry months differ for each site. UID is the most dominant aerosol type in Chiayi (66%), EPANCU (50%), Jambi (50%), Kuching (47%), NCUTaiwan (56%), Palangkaraya (67%), Pontianak (48%), and Singapore (60%). BBW is most dominant in UbonRatchathani (38%), but is also found in significant amounts in Jambi (22%), Kuching (24%), Lulin (14%), Mukdahan (15%), NDMarbelUniv (21%), NhaTrang (23%), Omkoi (34%), USMPenang (11%). During September and October when biomass burning is common in the region, traces of BBW and BBD are found in Kuching, Pontianak, Singapore, and Taipei. UID type is commonly observed all throughout the year.

2016

Oral
Presentation

Determination of Monthly Aerosol Types in Manila Observatory and Notre Dame of Marbel University from Aerosol Robotic Network (AERONET) measurements

Hans Jarett J. Ong, Nofel Lagrosas, Sherdon Niño Yu Uy, and 4 more authors

2016

AGU Fall Meeting, San Francisco, California, USA

Abs

This study aims to identify aerosol types in Manila Observatory (MO) and Notre Dame of Marbel University (NDMU) using Aerosol Robotic Network (AERONET) Level 2.0 inversion data and five dimensional specified clustering and Mahalanobis classification. The parameters used are the 440-870 nm extinction Angström exponent (EAE), 440 nm single scattering albedo (SSA), 440-870 nm absorption Angström exponent (AAE), 440 nm real and imaginary refractive indices. Specified clustering makes use of AERONET data from 7 sites to define 7 aerosol classes: mineral dust (MD), polluted dust (PD), urban industrial (UI), urban industrial developing (UID), biomass burning white smoke (BBW), biomass burning dark smoke (BBD), and marine aerosols. This is similar to the classes used by Russell et al, 2014. A data point is classified into a class based on the closest 5-dimensional Mahalanobis distance (Russell et al, 2014 & Hamill et al, 2016). This method is applied to all 173 MO data points from January 2009 to June 2015 and to all 24 NDMU data points from December 2009 to July 2015 to look at monthly and seasonal variations of aerosol types. The MO and NDMU aerosols are predominantly PD ( 77%) and PD & UID ( 75%) respectively (Figs.1a-b); PD is predominant in the months of February to May in MO and February to March in NDMU. PD results from less strict emission and environmental regulations (Catrall 2005). Average SSA values in MO is comparable to the mean SSA for PD ( 0.89). This can be attributed to presence of high absorbing aerosol types, e.g., carbon which is a product of transportation emissions. The second most dominant aerosol type in MO is UID ( 15%), in NDMU it is BBW ( 25%). In Manila, the high sources of PD and UID (fine particles) is generally from vehicular combustion (Oanh, et al 2006). The detection of BBW in MO from April to May can be attributed to the fires which are common in these dry months. In NDMU, BBW source is from biomass burning (smoldering). In this analysis, smoke from biomass burning transported from other Southeast Asian countries are not observed because of low number of inversion data points. However, fine mode AOD values in NDMU from September to October can have values greater than 1 which implies detection of this transported biomass burning smoke.