Replay-based synchronization of timestamps in event traces of massively parallel applications
Main Article Content
Abstract
Event traces are helpful in understanding the performance behavior
of message-passing applications since they allow in-depth analyses
of communication and synchronization patterns. However, the absence
of synchronized hardware clocks may render the analysis ineffective
because inaccurate relative event timings can misrepresent the
logical event order and lead to errors when quantifying the impact
of certain behaviors. Although linear offset interpolation can
restore consistency to some degree, inaccuracies and time-dependent
drifts may still disarrange the original succession of events—especially during longer runs. In our earlier work, we have
presented an algorithm that removes the remaining violations of the
logical event order postmortem and, in addition, have outlined the
initial design of a parallel version. Here, we complete the parallel
design and describe its implementation within the Scalasca
trace-analysis framework. We demonstrate its suitability for
large-scale applications running on more than thousand application
processes and evaluate its accuracy by showing that it eliminates
inconsistent inter-process timings while preserving the length of
local intervals.
of message-passing applications since they allow in-depth analyses
of communication and synchronization patterns. However, the absence
of synchronized hardware clocks may render the analysis ineffective
because inaccurate relative event timings can misrepresent the
logical event order and lead to errors when quantifying the impact
of certain behaviors. Although linear offset interpolation can
restore consistency to some degree, inaccuracies and time-dependent
drifts may still disarrange the original succession of events—especially during longer runs. In our earlier work, we have
presented an algorithm that removes the remaining violations of the
logical event order postmortem and, in addition, have outlined the
initial design of a parallel version. Here, we complete the parallel
design and describe its implementation within the Scalasca
trace-analysis framework. We demonstrate its suitability for
large-scale applications running on more than thousand application
processes and evaluate its accuracy by showing that it eliminates
inconsistent inter-process timings while preserving the length of
local intervals.
Article Details
Issue
Section
Proposal for Special Issue Papers