The performance of data processing in distributed information systems strongly depends on theefficient scheduling of the applications that access data at the remote sites. This work assumes atypical model of distributed information system where a central site is connected to a number ofremote and highly autonomous remote sites. An application started by a user at a central site isdecomposed into several data processing tasks to be independently processed at the remote sites.The objective of this work is to find a method for optimization of task processing schedules at acentral site. We define an abstract model of data and a system of operations that implements thedata processing tasks. Our abstract data model is general enough to represent many specific datamodels. We show how an entirely parallel schedule can be transformed into a more optimal hybridschedule where certain tasks are processed simultaneously while the other tasks are processedsequentially. The transformations proposed in this work are guided by the cost-based optimizationmodel whose objective is to reduce the total data transmission time between the remote sites and acentral site. We show how the properties of data integration expressions can be used to find moreefficient schedules of data processing tasks in distributed information systems.
Getta, J. R. (2011). Optimization of task processing schedules in distributed information systems. Proceeding of International Conference on Informatics Engineering and Information Science, ICIEIS 2011 (pp. 333-345). New York: Springer.