Ítem
Solo Metadatos

Assessing the impact of concurrent replication with canceling in Parallel Jobs

dc.creatorQiu, Zhanspa
dc.creatorPérez, Juan F.spa
dc.date.accessioned2020-08-28T15:49:14Z
dc.date.available2020-08-28T15:49:14Z
dc.date.created2015-02-09spa
dc.description.abstractParallel job processing has become a key feature of many software applications, e.g., in scientific computing. Parallelization allows these applications to exploit large resource pools, such as cloud or grid data centers. However, a job composed of a large number of parallel tasks will suffer a failure if any of its tasks fail, requiring reprocessing and additional delays. In this paper, we explore the effect that the replication of parallel jobs has on the job reliability and response time, as well as on resource utilization. The replication mechanism consists of concurrently processing replicas, at either the job or the task level, retrieving the results of the replica that finishes first, if any, and canceling any remaining replica in process. We propose a stochastic model that explicitly considers parallel job processing, replication at both the job and the task level, and handles general arrival processes. We develop a numerically-efficient algorithm to solve large-scale instances of the model and compute key performance metrics. We observe that the task cancellation mechanism offers an effective way of limiting the increase in resource utilization, allowing the use of replicas that not only increase the job reliability, but have the potential to reduce the response times.eng
dc.format.mimetypeapplication/pdf
dc.identifier.doihttps://doi.org/0.1109/MASCOTS.2014.13
dc.identifier.issnEISBN: 978-1-4799-5610-4
dc.identifier.urihttps://repository.urosario.edu.co/handle/10336/28505
dc.language.isoengspa
dc.publisherIEEEspa
dc.relation.citationEndPage40
dc.relation.citationStartPage31
dc.relation.citationTitle2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems
dc.relation.ispartofIEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems, EISBN: 978-1-4799-5610-4 (2014 ); pp. 31-40spa
dc.relation.urihttps://ieeexplore.ieee.org/document/7033635spa
dc.rights.accesRightsinfo:eu-repo/semantics/restrictedAccess
dc.rights.accesoRestringido (Acceso a grupos específicos)spa
dc.source2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systemsspa
dc.source.instnameinstname:Universidad del Rosario
dc.source.reponamereponame:Repositorio Institucional EdocUR
dc.subject.keywordReliabilityspa
dc.subject.keywordTime factorsspa
dc.subject.keywordComputational modelingspa
dc.subject.keywordNumerical modelsspa
dc.subject.keywordVectorsspa
dc.subject.keywordGeneratorsspa
dc.subject.keywordEquationsspa
dc.titleAssessing the impact of concurrent replication with canceling in Parallel Jobsspa
dc.title.TranslatedTitleEvaluación del impacto de la replicación concurrente con la cancelación en trabajos paralelosspa
dc.typebookParteng
dc.type.hasVersioninfo:eu-repo/semantics/publishedVersion
dc.type.spaParte de librospa
Archivos
Colecciones