DeDu: Building a deduplication storage system over cloud computing
This paper presents a deduplication storage system over cloud computing. Our deduplication storage system consists of two major components, a front-end deduplication application and Hadoop Distributed File System. Hadoop Distributed File System is common back-end distribution file system, which is used with a Hadoop database. We use Hadoop Distributed File System to build up a mass storage system and use a Hadoop database to build up a fast indexing system. With the deduplication applications, a scalable and parallel deduplicated cloud storage system can be effectively built up. We further use VMware to generate a simulated cloud environment. The simulation results demonstrate that our deduplication cloud storage system is more efficient than traditional deduplication approaches.