Menu Close

Page 17 of 23

Imputation of Missing data

INTRODUCTION

Transportation data is of great importance for intelligent transportation system. Missing data problems are inevitable during data collection.

Challenges in existing imputation methods: potential useful information is not efficiently used in the modeling process; methods considering temporal correlation usually assuming that linear relationships exist between observed variables and latent variables; most techniques fail to measure the uncertainty.

This study introduces the use of a self-measuring multi-task Gaussian process (SM-MTGP) method for imputing missing data.

CONTRIBUTIONS

A SM-MTGP method is proposed to combine features from tasks and inputs to measure similarities jointly.

Dependencies of tasks and inputs are explored via covariance functions under SM-MTGP framework.

Correlations between responses are captured to provide additional information for enhancing imputation accuracy.

METHODOLOGY

Brief review of MTGP

Assuming we have \(Q\) tasks and a set of observations \(Y = \left\{ {{y_{i1}},{y_{i2}}, \ldots {y_{iD}}} \right\}, i = 1, \ldots ,Q\), for each corresponding task at \(????\) distinct inputs, where \(????_{????????}\) is the response for \(????^{????ℎ}\) task given the input \(????_????\).

FIGURE 1 Vectorization of matrix Y

When the SM-MTGP model is introduced to the imputation of missing values of transfer passenger flow, the shared information of tasks is considered in terms of the temporal relatedness of various days. Transfer passenger flow over \(Q\) days can be treated as \(Q\) tasks, and the number of sampling time intervals \(D\) per day represents \(D\) distinct inputs. We define a matrix \(Y = \left\{ {{y_{ij}}} \right\}(i = 1,2,…,Q;j = 1,2,…D) \in Q \times D \), where \({{y_{ij}}}\) is number of transfer passengers for the \({i^{th}}\) day (task) on the \({j^{th}}\) time interval (input). By stacking the column vectors of \(Y \in Q \times D \), a \(Q \times D\) dimension vector \({\bf{y}} = vec(Y)\) is obtained (Figure 1).

The MTGP model of \({{\bf{\tilde y}}}\)can be described as Equation (1):
$${{\tilde y}_{ij}} = {m_{ij}} + \varepsilon ,\quad \varepsilon \sim N\left( {0,{\sigma ^2}} \right) \tag{1}$$ where \({m_{ij}}\) is the expected value of the element \({{\tilde y}_{ij}}\), and \(\varepsilon\) is an additive Gaussian noise with variance \({{\sigma ^2}}\).
$$m \sim N\left( {0,{\Sigma _Q} \otimes {\Sigma _D}} \right) \label{TGP} \tag{2}$$ $${\Sigma _Q} = K_Q^fG_Q^m, \quad{\Sigma _D} = K_D^fG_D^m \label{covariance matrix} \tag{3}$$ The covariance matrices \({\Sigma _Q}\) are defined as a product of kernel of days features (tasks) \(K_Q^f\) and the self-measuring kernel \(G_Q^m\), and \({\Sigma _D}\) are defined as a product of the kernel of time intervals features (inputs) \(K_D^f\) and self-measuring kernel \(G_D^m\).
$${K_Q^f} = k\left( {{y_i},{y_j}} \right) \in \mathbb{R}^{Q \times Q}, \quad {G_Q^m} = g\left( {{y_{i:}},{y_{j:}}} \right) \in \mathbb{R}^{Q \times Q} \tag{4}$$ $${K_D^f} = k\left( {{y_h},{y_l}} \right) \in \mathbb{R}^{D \times D}, \quad {G_D^m} = g\left( {{y_{:h}},{y_{:l}}} \right) \in \mathbb{R}^{D \times D} \tag{5}$$ where \(k\left( {{y_i},{y_j}} \right)\) and \(k\left( {{y_h},{y_l}} \right)\) indicate covariances of features of \({i^{th}}\) day and \({j^{th}}\) day, and covariances of features of \({h^{th}}\) time interval and \({l^{th}}\) time interval, respectively. Similarly, \(g\left( {{y_{i:}},{y_{j:}}} \right)\) and \(g\left( {{y_{:h}},{y_{:l}}} \right)\) measure covariances of self-measuring observations of \({i^{th}}\) day and \({j^{th}}\) day, and covariances of self-measuring observations of \({h^{th}}\) time interval and \({l^{th}}\) time interval.

By following the principle of MTGP, the joint distribution of \({\tilde Y}\) can be described as Equation (6), where \(\Phi = {\Sigma _Q} \otimes {\Sigma _D} + {\sigma ^2}{\bf{I}}\).
$$\int {p\left( {\tilde Y|M,0,{\sigma ^2}} \right)} p\left( {M|{\Sigma _Q},{\Sigma _D}} \right)dM = N\left( {{\bf{\tilde y}}|{\bf{0}},\Phi } \right) \tag{6}$$ Using a Gussian process framework given the observed number of transfer passengers, the unobserved passenger flows in \(Y\) can be derived by predictive equation (7). $$E[{{\tilde y}_{ab}}|{{{\bf{\tilde y}}}_{obs}},{\Sigma _Q},{\Sigma _D}] = \left( {{\Sigma _{{Q_a}}} \otimes {K_{{D_b}}}} \right)_{obs}^T\Phi _{obs}^{ – 1}{{{\bf{\tilde y}}}_{obs}} \tag{7}$$ where \({\Phi _{obs}} = {\bf{P}}\Phi {{\bf{P}}^T} \in \mathbb{R}^{M \times M} \) is a covariance matrix over the observed transfer passenger flows in \(Y\), \({{\Sigma _{{Q_a}}}}\) denotes \({a^{th}}\) column vector in \({\Sigma _Q}\), which measures the similarities between \({a^{th}}\) day and all the other days among \(Q\) days, and \({{K_{{D_b}}}}\) indicates \({b^{th}}\) column vector of \({K_D}\), which represents covariance between \({b^{th}}\) time interval and all the remaining time intervals of \(D\) samples.

EXPERIMENT

Data analysed includes 6-months of passenger flow data collected by WiFi sensors at Richmond railway station (Figure 2), Melbourne, Australia.

FIGURE 2 Map of location and train lines of Richmond station.

FIGURE 3 Map of 12 WiFi sensors distribution.

The deployed 12 sensors are distributed at platforms 7-10 and two sided underpasses (Figure 3).

IMPUTATION PERFORMANCE

Figure 4 indicates the RMSE results of discrete missing pattern with various algorithms. The improvements in RMSE by SM-MTGP is around 60% for three different missing rates.

FIGURE 4 RMSE Results for Different Missing Ratios of Discrete missing data.

Three mixed missing patterns under different missing ratios are reported (Figure 5-7). The SM-MTGP method is still able to obtain better performance compared with all the other methods, leading to improvements in RMSE up to 60%.

FIGURE 5 RMSE Results for Different Missing Ratios of Mixed Missing Data
with One Random Day Missing.

FIGURE 6 RMSE Results for Different Missing Ratios of Mixed Missing Data
with Two Random Day Missing.

FIGURE 7 RMSE Results for Different Missing Ratios of Mixed Missing Data
with Four Random Day Missing.

CONCLUSIONS

Imputation accuracy can achieve around 60% improvement in RMSE in all the tested missing scenarios compared with the base model.

SM-MTGP significantly outperforms other methods under the large missing ratio.

On-going research on incorporating other features into this algorithm to make application on large-scale transit network and simplifying model computational complexity.

For more detailed information please contact our TUPA members below;
Wenhua Jiang, [email protected]

2019 Group BBQ

 

98th TRB Conference journey

The Transportation Research Board (TRB) 98th Annual Meeting will be held January 13–17, 2019, at the Walter E. Washington Convention Center, in Washington, D.C.

Two PhD students(Wenhua Jiang and Chunliang Wu) and Three Masters students(Yan Xia, Huamei Zhu and Ning Xu) presented their research outcomes at TRB. Very well done, especially those from Joint Masters Program.

Monash Transportation ranked #1

The latest subject rankings released through the the Centre of World University Rankings (for 2017) puts Monash in first place in the transportation field:
https://cwur.org/2017/subjects.php#Transportation

THE 24TH INTERNATIONAL CONFERENCE OF HONG KONG SOCIETY FOR TRANSPORTATION STUDIES

http://www.hksts.org/conf.htm

  1. Parallel computing for user equilibrium problem in a cluster with integrated computing resources by Xinyuan Chen, Zhiyuan Liu and Inhi Kim
  2. Exploring Human Mobility Pattern Using Complex Network Theory and Spatial Econometric Model by  Chunliang Wu, Inhi Kim and Zhiyuan Liu

Tianqi Gu’s mid review

I am very glad that Tianqi Gu has been passed the mid-review. His research topic is “Inter Relationship between Shared Bike System and Mass Transit considering Traffic Condition” supervised by Prof. Graham Currie, and me. The chair (Prof. Geoff Rose) and the pannels (Dr. Nan Zheng and Dr. Ye Lu) were served for this event. Congratulation Tianqi and keep it up!!

Ogden 2018 Transport Lecture

Monash ITS welcomes Prof. Bart van Arem from Delft University of Technology, The Netherlands giving a speech to transport lovers at State Library in Melbourne.

Lecture Topic

Driverless vehicles and the future of urban transport: Beyond the Hype

Abstract

Melbourne faces a considerable challenge to maintain its much celebrated livability in the face of population growth. Transport is critical and smart solutions are needed. Autonomous or driverless vehicles are regularly promoted as the ‘solution’ to a wide range of urban transport challenges. Will we sleep walk into an urban mobility future defined for us by the driverless vehicle industry or make conscious choices about the role that advanced technology should play in creating a transport system to support the type of city we want Melbourne to become?  This year’s Ogden Transport Lecture looks beyond the hype and draws insight from international research to put driverless vehicles and the future of urban mobility under the microscope.

Speaker Biography

Professor Bart van Arem has an international reputation for research that focuses on analysing and modelling the implications of intelligent vehicles. Bart is Director of the TU Delft Transport Institute and a Professor in the Department Transport & Planning, Faculty of Civil Engineering and Geosciences, Delft University of Technology in the Netherlands.  He has held previous appointments at the University of Twente and in TNO, the Netherlands Organisation for Applied Scientific Research.

TUPA bbq

Wenhua’s confirmation

I am very glad that Wenhua has been confirmed PhD. Her research topic is “Short-term passenger demand prediction and crowd management for special events” supervised by Dr. Nan Zheng, Dr. Paul Reichl, and me. The chair (Prof. Hai Vu) and the pannels (Dr. Kun An and Dr. Ha Bui) were served for this event. Congratulation Wenhua and keep it up!!

Xinyuan’s Final Review

A big congrats for Xinyuan on passing the final review. Prof. Graham Currie as a chair, Dr. Nan Zheng and Dr. Qianbing Zhang as panel members are served for this milestone. Good luck for your PhD thesis and job hunting!!