中的一些问题缺点毛病,在这篇论文中给了一个很好的总结和解释

——————————————————————

Let us consider some of the obstacles to efficient execution.
These problems should be no surprise to the distributed
computing expert, but are far from obvious to end
users.

Dispatch Latency. The cost to dispatching a job within
a conventional batch system (not to mention a large scale
grid) is surprisingly high. Dispatching a job from a queue
to a remote CPU requires many network operations to authenticate
the user and negotiate access to the resource, synchronous
disk operations at both sides to log the transaction,
data transfers to move the executable and other details, not
to mention the unpredictable delays associated with contention
for each of these resources. When a wide area
system is under heavy load, dispatch latency can easily be
measured in minutes. For batch jobs that intend to run for
hours, this is of little concern. But, for many short running
jobs, this can be a serious performance problem. Even if
we assume that a system has a relatively fast dispatch latency
of one second, it would be foolish to run jobs lasting

one second: one job would complete before the next can
be dispatched, resulting in only one CPU being kept busy.
Clearly, there is an incentive to keep job granularity large
in order to hide the worst case dispatch latencies and keep
CPUs busy.
   Failure Probability. On the other hand, there is an incentive
not to make individual jobs too long. Any kind of
computer system has the possibility of hardware failure, but
a shared computing environment also has the possibility that
a job can be preempted for a higher priority, usually resulting
in a rollback to the beginning of the job on another CPU.
Short runs provide a kind of checkpointing, as a small result
that is completed need not be regenerated. Long runs
also magnify heterogeneity in the pool. For instance, a job
that should take 10 seconds on a typical machine but takes
30 on the slowest isn’t a problem if batched in small sets.
The other machines will just cycle through their sets faster.
But, if jobs are chosen such that they run for hours even on
the fastest machine, the workload will incur a long delay
waiting for the final job to complete on the slowest. Another
downside to jobs that run for many hours is that it is
difficult to discriminate between a healthy long-running job
and a job that is stuck and not making progress. Thus, there
are advantages to short runs, however they increase overhead
and thus should be mitigated whenever possible. An
abstraction has to determine the appropriate job granularity,
noting that this depends on numerous factors of the job, the
grid, and the particular execution.
   Number of Compute Nodes. It is easy to assume that
more compute nodes is automatically better. This is not
always true. In any kind of parallel or distributed problem,
each additional compute node presents some overhead
in exchange for extra parallelism. All-Pairs is particularly bad, because the data is not easily partitionable: each node
needs all of the data in order to compute any significant subset
of the problem. This restriction can be lifted, but only
if the requirement to maintain in-order compilation of results
and the preference for a resource-blind bag of tasks
are also removed. Data must be transferred to that node by
some means, which places extra load on the data access system,
whether it is a shared filesystem or a data transfer service.
More parallelism means more concurrently running
jobs for both the engine and the batch system to manage,
and a greater likelihood of a node failing, or worse, concurrent
failures of several nodes, which consume the attention
(and increase the dispatch latency) of the queuing system.
   Data Distribution. After choosing the proper number of
servers, we must then ask how to get the data to each computation.
A traditional cluster makes use of a central file
server, as this makes it easy for programs to access data on
demand. However, it is much easier to scale the CPUs of a
cluster than it is to scale the capacity of the file server. If the
same input data will be re-used many times, then it makes
sense simply to store the inputs on each local disk, getting
better performance and scalability. Many dedicated clusters
provide fixed local data for common applications (e.g.
genomic databases for BLAST [2]). However, in a shared
computing environment, there are many different kinds of
applications and competition for local disk space, so the
system must be capable of adjusting the system to serve new
workloads as they are submitted. Doing this requires significant
network traffic, which if poorly planned can make
distributed solving worse than solving locally.
    Hidden Resource Limitations. Distributed systems are
full of unexpected resource limitations that can trip up the
unwary user. The major resources of processing, memory,

and storage are all managed by high level systems, reported
to system administrators, and well known to end users.
However, systems also have more prosaic resources. Examples
are the maximum number of open files in a kernel,
the number of open TCP connections in a network translation
device, the number of licenses available for an application,
or the maximum number of jobs allowed in a batch
queue. In addition to navigating the well-known resources,
an execution engine must also be capable of recognizing
and adjusting to secondary resource limitations.
  Semantics of Failure. In any kind of distributed system,
failures are not only common, but also hard to define.
It isn’t always clear what constitutes failure, whether the
failure is permanent or temporary, and whether a localized
failure will affect other running pieces of the workload. If
the engine managing the workflow doesn’t know the structure
of the problem, the partitioning, and the specifics about
the job that failed, it will be almost impossible to recover
cleanly. This is problematic, because expanding system size
dramatically increases our chances of running into failures;
no set of any significant number of machines has all of them
online all the time.
    It should be clear that our concern in this work is not
how to find the optimal parallelization of a workload under
ideal conditions. Rather, our goal is to design robust abstractions
that result in reasonable performance, even under
some variation in the environment.

然后围绕这些问题,本文提出的系统作出一个方案来说明如何避免这些问题

也就是说整个系统设计的过程中要避免这些问题

这部分是个综述,给出一个整体系统流程图

接着给了一个基本大方向的的流程

1. Model the system 2 Dsitribute the data 3. Dispatch batch jobs 4. clean up the system

第一部分就是用函数打倒一切读者,丢出来3个函数,然后加上解释足以吓人

第二部分数据分布所采用的形式,图形式,即在刚刚给出一个整体流程图中的数据分布部分细节化,详细的对这部分操作进行了说明,该部分运用了spanning tree的算法来进行数据分布,并且给出了一个数据分布不同算法效率上的比较图

第三部分就是对分布后的数据进行计算,基本上是对计算的一个模糊描写,细

标签:

相关日志


相关博文

评论

Good.Be the first to comment on this entry.

Post comment

comment has COPYRIGHT too!