Towards Globally-Distributed Data

Towards Globally-Distributed Data

1 Global Horizons for Big Data
Big Data is increasingly behind major business decisions [14]. It is now common
for a medium sized company to process terabytes of data a day; and for the result-
ing analytics to support services and management decisions. This trend drives the
analytic database market, which is a major growth area in the multi-billion dollar
database market.
To tackle Big Data we must adopt scalable systems. Parallel databases are lim-
ited in their scalability, since they cope poorly with node failures that are common
on scales of more than a dozen nodes. Thus fault tolerance is key to scalability.
Google was amongst the first organisations that built systems that work with
data on a big scale. Thus the state of the art is heavily inspired by Google’s
technology stack, namely Chubby [5], MapReduce [9], Google File System [10]
and BigTable [7]. These technologies enable data to be processed in a warehouse
scale datacentre, involving tens of thousands of machines persisting petabytes of
data. The academic community is also pushing Big Data further with projects
covering warehouse scale query processing [2, 16] and column stores [15, 1].
Google is now working with data on a still bigger scale. To cope with dis-
asters that damage or disrupt datacentres, Google’s flagship products, such as F1
(Google’s advertising backend), require data to be replicated globally [3, 8]. These
products are required to scale to millions of nodes across hundreds of datacentres.
Global replication brings data closer to consumers, thus improving read perfor-
mance. However, due to fluctuating high latency in a Wide Area Network, global
replication brings significant new challenges. In particular, minute attention to
time is required to maintain adequate consistency.
2 The State of the Art
In lectures, we have covered the key technologies that allow Google to process
data on a warehouse scale. Fault tolerant parallel computations are delivered by
MapReduce [9]. Fault tolerant distributed data persistence is provided by the
Google File System (GFS) [10]. Fault tolerant distributed indexing is provided
by BigTable [7]. Many of Google’s services, e.g. Google Personalized Search,
Google Analytics and Google Earth, fit a particular computational model. The
application uses BigTable to lookup data to extract from GFS, which is input into
a MapReduce job. The output of the MapReduce job is then persisted and indexed
using BigTable on top of GFS.
The open source community has developed several projects based on Google’s
scalable systems. Apache Hadoop, the Hadoop File System (HDFS) and HBase
are open source clones of Google’s MapReduce, GFS and BigTable respectively.
Thus anyone can deploy a clone of Google’s systems. For example, Facebook
deploy these Apache Hadoop technologies to manage their multi petabyte data
warehouses [4].
Hadoop, HDFS and HBase all hold data critical for operation and recovery at
a single master node. If the master node were to become unavailable, the whole
system would temporarily become unavailable. Furthermore, catastrophic failure
leading to data loss at the master node could lead to the entire system becoming
unrecoverable. Google achieves a highly available system by removing critical
single points of failure using the Chubby locking service [5]. Chubby removes
the single point of failure, while maintaining consistency, by using the Paxos al-
gorithm [13, 6]. Apache Zookeeper fulfils the role of Chubby in the Hadoop
3 Towards Globally Distributed Data
It is your job to take our investigation a step further by studying Spanner [8] —
Google’s latest database technology. Spanner is a database that is designed to
meet the data management requirements of Google’s growing suite of applica-
tions. Google’s applications must not only operate at scale, but also must guaran-
tee high availability and prevent data loss. Google achieves these requirements by
replicating data across several datacentres located in distinct geographical regions.
BigTable is designed with the assumption that all nodes are connected on
a high speed network in a single warehouse scale datacentre. Thus BigTable
assumes that the round trip between any two nodes is less than a millisecond
(see [7] Section 7). When replicating data between datacentres this assumption
no longer holds. Google’s quick intermediate solution was Megastore [3], which
builds directly on top of several BigTable instances, running in each datacentre.
Megastore is adequate to support several of Google’s flagship products, including
Gmail, Picasa, Calendar, Android Market and Google AppEngine. However, the
performance of Megastore, in particular write throughput, is poor.
MegaStore uses Paxos to replicate primary user data across datacentres on ev-
ery write [3]. Instead of building on top of BigTable, Spanner [8] is a redesign
of Google’s data infrastructure with global distribution at the core. Spanner im-
proves write performance while maintaining strong consistency. The notion of
strong consistency maintained is called external consistency. External consistency
is defined by Gi

ord [11] as follows:
External consistency guarantees that a transaction will always receive
current information. The
actual time
order in which transactions com-
plete defines a unique serial schedule. This serial schedule is called
the external schedule. A system is said to provide
external consis-
if it guarantees that the schedule it will use to process a set of
transactions is equivalent to its external schedule.
Spanner implements a TrueTime API that uses multiple clock references to
keep the global error on actual time within 10 milliseconds. TrueTime is critical
to supporting external consistency, while maintaining good write performance.
Measuring time is known to be important for guaranteeing reasonable progress
for distributed algorithms [12]; however Spanner is one of the few systems where
the finest attention to time is critical. The critical nature of TrueTime in Spanner
gives us new insight into distributed systems as concluded by Corbett et al. [8]:
As a community, we should no longer depend on loosely synchro-
nised clocks and weak time APIs in designing distributed algorithms.
Thus Spanner is a milestone for computer science that demonstrates that time
should be part of the design of globally distributed algorithms.
4 Your Challenge
Your work should be submitted as a technical report written in English. The page
limit is 10 pages. In your report, put yourself in the situation of an systems analyst
that is extracting the technical requirements of a scalable globally distributed data
centric system inspired by Google Spanner. You should address the following four
Briefly outline the state of the art for working with Big Data and explain
the role of Paxos in BigTable. Also speculate about how Paxos can be used
to make the master node more reliable in MapReduce. Remember that the
master node holds critical routing and status information about all the nodes
and tasks involved in a MapReduce job. You will require the papers on
BigTable [7], Chubby [5] and MapReduce [9] to answer this question.
Read the paper on Spanner [8]. Use the paper to explain how Spanner uses a
modified Paxos algorithm to replicate primary user data across globally dis-
tributed datacentres. Describe how this modified Paxos algorithm is di

ent from the basic Paxos algorithm [13]. Please use your recently obtained
background knowledge, to expand beyond the explanation of Paxos given
in [8] Section 2. In particular, provide details about how the communication
phases of the modified Paxos algorithm used in Google Spanner work.
Explain what the TrueTime API is and how TrueTime is used to improve the
write performance of Spanner while maintaining external consistency. In
particular, identify how TrueTime is used in the Paxos algorithm described
in the previous bullet point. You should interpret the paper on Spanner so
that developers wishing to implement an (open source) clone of Spanner
can identify the requirements. TrueTime and its usage is covered in Section
3 and Section 4 in [8].
Conclude, by comparing the use of Paxos in Spanner to the use of Paxos in
BigTable. Argue the case for developing an open source clone of Spanner,
by highlighting the benefits of global distribution. Finally, summarise the
technical challenges identified in the body of the report for developing such
a Spanner clone.
The deadline for this report is Tuesday 14th May. Please start reading the
paper on Spanner as soon as possible — it will take several readings to understand.
Also, please contact me before the deadline if you have di
culty understanding
any of the points above.


Place an order today and get 13% Discount (Code GAC13)