Using grouping strategy and pattern discovery for delta extraction in a limited collaborative environment
This work considers extracting delta in a distributed environment where the collaboration from highly autonomous operational database management systems is limited to granting read only access on a set of selected relational tables. Because of inherently huge volume of data in data warehouse system, it is critical to minimise communication costs as much as possible. Based on the observation that usually, two consecutive snapshots are not very different, a statistical-based group hash method is developed to minimise the volumes of data required to complete the data extraction. In addition, to relax the assumption that the changes to remote data are only caused by random events, we define a progression pattern to describe data changes with temporal regularities and also propose a method for progression pattern discovery.