Degree Name

Doctor of Philosophy


School of Computing and Information Technology


With the increasing prevalence of camera networks in public spaces, a significant volume of video data is being generated. Automating the identification and re-identification of traffic components, such as vehicles, across non-overlapping cameras has a lot of potential applications for smart cities. This task is also known as vehicle re-identification.

Vehicle re-identification plays great roles in intelligent transportation, urban computing, or intelligent surveillance, for discovering and locating cars, estimating travel times, or analysing traffic behaviour. However, it faces many challenges caused by the crossview’s discrepancies in viewpoint, illumination, occlusion, and background clutter. With the rapid development of deep learning technologies and the introduction of large scale datasets, vehicle re-identification has seen remarkable advancements in recent years.

While existing methods have reached state-of-the-art results, there is a lack of analysis of the datasets as well as the explainability of existing pipelines. In fact, it is observed that some vehicle dataset shows recognisable faces. Because existing re-identification models are black boxes, it is unknown what happens behind the scenes. Therefore, the main objective of this thesis it to introduce the practice of more responsible vehicle reidentification in three phases.

The first phase introduces a generative AI image-to-image translation technique to create a new dataset that ensures privacy and anonymity. As a consequence, drivers’ and passengers’ identities are protected and respected. Moreover, diversity is enabled and no sort of unjustified surveillance can be performed.

The second phase focuses on building a vehicle re-identification algorithm using an attention-based architecture. Models using transformer backbones have proven to surpass the performance of models relying on convolutional neural networks. Therefore, this thesis proposes to use an architecture that is based on self-attention and outlook-attention to extract features.

The third phase assesses the models on the generated dataset. The experiments show that more private vehicle re-identification using attention-based models is feasible and needs to be prioritised.

It is interesting how we put so much time and effort into striving for the best results by tweaking every possible hyperparameter of the deep learning models, but we fail to consider the importance of the human element in vehicle re-identification. Hence, we need to set out boundaries and regulations for more responsible practice of vehicle re-identification and put humans at the centre of attention.

FoR codes (2008)

080104 Computer Vision, 080106 Image Processing



Unless otherwise indicated, the views expressed in this thesis are those of the author and do not necessarily represent the views of the University of Wollongong.