Deep Learning for Assistive Navigation of Vision-impaired People
Visually impaired people face numerous challenges in daily life due to their limited ability to perceive their surroundings. This lack of visual information can lead to diffculties in navigation, object recognition, and environment perception. Traveling safely and independently is one of the most challenging tasks, even with the assistance of white canes or guide dogs. Small collisions and falls are a daily norm for them, posing significant risks to their safety and well-being. Additionally, the lack of vision restricts their ability to explore new places freely, impacting their quality of life and their ability to fully engage in society.
Assistive navigation systems are designed to address these challenges by utilizing advanced sensors as substitutes for vision. These technologies perceive the surroundings and provide visually impaired individuals with real-time information about their environment. However, these sensor-based obstacle detectors have limited functionality and reliability when facing complex and dynamic environments. Deep learning algorithms for scene perception offer a more comprehensive solution to this problem. These algorithms can analyze and interpret visual data with high accuracy, enabling the recognition of walkable paths and the detection of obstacles, which are the two most crucial factors for safe travel.
Semantic segmentation offers a unified and efficient solution for pedestrian lane detection and obstacle sensing. By accurately classifying each pixel in an image, semantic segmentation allows for precise identification of pedestrian lanes and distant obstacles, ensuring that users can stay within walkable regions and receive collision warnings in advance. Combining semantic segmentation with monocular depth estimation further enhances this capability, offering a comprehensive 3D understanding of the environment. Monocular depth estimation uses a single camera to predict the distance and spatial arrangement of objects, providing depth information that complements the segmentation data.
This thesis provides three major contributions. First, we review and evaluate state-of-the-art algorithms for pedestrian lane detection using a benchmark lane dataset. Second, we propose LSS-Net, a lightweight segmentation model for pedestrian scene understanding with low computational requirements. As part of this work, we create a novel pedestrian scene dataset called TrueSight for training and evaluating algorithms targeting real-world assistive navigation. Third, we propose AMT-Net, an efficient multi-task model for joint semantic segmentation and monocular depth estimation. This model uses a single-decoder architecture, achieving optimal computational efficiency on two challenging datasets. Our extensive experiments have demonstrated the state-of-the-art performance of the proposed networks. The work in the thesis contributes novel algorithms and labeled datasets for two important tasks in assistive navigation of blind people: real-time pedestrian lane detection and depth estimation from color images.
History
Year
2024Thesis type
- Doctoral thesis