Ítem
Solo Metadatos

Enhancing reliability and response times via replication in computing clusters

dc.creatorQiu, Zhan.spa
dc.creatorPérez, Juan F.spa
dc.date.accessioned2020-08-28T15:49:14Z
dc.date.available2020-08-28T15:49:14Z
dc.date.created2015-08-24spa
dc.description.abstractComputing clusters have been widely deployed for scientific and engineering applications to support intensive computation and massive data operations. As applications and resources in a cluster are subject to failures, fault-tolerance strategies are commonly adopted, sometimes at the expense of additional delays in job response times, or unnecessarily increasing resource usage. In this paper, we explore concurrent replication with canceling, a fault-tolerance approach where jobs and their replicas are processed concurrently, and the successful completion of either triggers the removals of its replica. We propose a stochastic model to study how this approach affects the cluster service level objectives (SLOs), particularly the offered response time percentiles. In addition to the expected gains in reliability, the proposed model allows us to determine the regions of the utilization where introducing replication with canceling effectively reduces the response times. Moreover, we show how this model can support resource provisioning decisions with reliability and response time guarantees.eng
dc.format.mimetypeapplication/pdf
dc.identifier.doihttps://doi.org/10.1109/INFOCOM.2015.7218512
dc.identifier.issnEISBN: 978-1-4799-8381-0
dc.identifier.urihttps://repository.urosario.edu.co/handle/10336/28504
dc.language.isoengspa
dc.publisherIEEEspa
dc.relation.citationEndPage1363
dc.relation.citationStartPage1355
dc.relation.citationTitle2015 IEEE Conference on Computer Communications (INFOCOM)
dc.relation.ispartofIEEE Conference on Computer Communications (INFOCOM), EISBN: 978-1-4799-8381-0 (2015); pp. 1355-1363spa
dc.relation.urihttps://ieeexplore.ieee.org/abstract/document/7218512spa
dc.rights.accesRightsinfo:eu-repo/semantics/restrictedAccess
dc.rights.accesoRestringido (Acceso a grupos específicos)spa
dc.source2015 IEEE Conference on Computer Communications (INFOCOM)spa
dc.source.instnameinstname:Universidad del Rosario
dc.source.reponamereponame:Repositorio Institucional EdocUR
dc.subject.keywordServersspa
dc.subject.keywordTime factorsspa
dc.subject.keywordReliabilityspa
dc.subject.keywordComputational modelingspa
dc.subject.keywordConferencesspa
dc.subject.keywordComputersspa
dc.subject.keywordSwitchesspa
dc.titleEnhancing reliability and response times via replication in computing clustersspa
dc.title.TranslatedTitleMejora de la confiabilidad y los tiempos de respuesta mediante la replicación en clústeres informáticosspa
dc.typebookParteng
dc.type.hasVersioninfo:eu-repo/semantics/publishedVersion
dc.type.spaParte de librospa
Archivos
Colecciones