Distributed And Cloud-Based Storage Systems (818e)

CSI 2118, TTh 9:30-10:45
Fall 2015

(Syllabus, Piazza)

This web page and schedule will be updated as the course goes along. Please check it regularly, and don't forget to reload.


The guiding philosophy of this course is that the best way to learn about real systems is to build one. We will gain an in-depth understanding of the issues involved in designing and deploying large-scale distributed file systems. In the course of this investigation we will be tackling a variety of topics, such as peer-to-peer systems, remote procedure calls, multi-threading, consensus protocols, cloud systems, layered systems (supporting high-level consistency guarantees on top of cloud services), and security as it relates to such systems.

The class will consist of lectures by the instructor, student project presentations, a midterm and a final, and a series of probably five programming projects, all in the language Go (fear not if you don't know anything about go, we'll all be learning together). The end goal is to have built a full-scale reliable, highly-available, and secure distributed file system, using both local disks and cloud services as backing stores. My lectures will be split between those describing the tools we will use to build our file systems, and lectures based on recent research in the literature (such as those at FAST 2014, SOSP 2013, OSDI 2012, and USENIX ATC 2014).

Examples of technologies we may use include FUSE (and MacFUSE), key value stores like Bolt or gkvlite or diskv or leveldb-go, the Amazon Simple Storage Service (and go binding), Google's Protocol Buffers or json (from Go), Google's Go language, PAXOS, SQLite, Snappy, and Apple's development kit for the iPad.

Office hours: after class in my office (4157 A.V. Williams).

Note that the following set of papers is only a placeholders: more will come, some will go away.

Sep 1 Intro/Overview/FUSE tutorial Sep 3 Go
Sep 8 Fault Tolerance and Security
The Design and Implementation of a Log-Structured File System

A Low-bandwidth Network File System

Sep 10 Deciding when to forget in the Elephant file system

Provenance-Aware Storage Systems.

Sep 15 Project 1 due yesterday
Intrusion Recovery Using Selective Re-execution

Intrusion Recovery for Database-backed Web Applications

Sep 17 The Google File System

GFS: Evolution on Fast-forward

Sep 22 Git Internals - no blog, 9.2 - 9.4

Replication, history, and grafting in the Ori file system

Sep 24 Semantic File Systems

A Logic File System

Sep 29 Distributed Object Stores, and Consistency
Time, clocks, and the ordering of events in a distributed system

Distributed Snapshots: Determining Global States of Distributed Systems

Oct 1 Project 2 due Sunday Night
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System

Session Guarantees for Weakly Consistent Replicated Data
(optional, sections 4,5 especially not necessary)

Oct 6 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

OceanStore: An Architecture for Global-Scale Persistent Storage

Oct 8 Sinfonia: A New Paradigm for Building Scalable Distributed Systems

Dynamo: amazon's highly available key-value store

Oct 13 Eventual and Causal Consistency

Eventually Consistent

IPFS - Content Addressed, Versioned, P2P File System

Optional, no blog: Camlistore is your personal storage system for life

Oct 15

ACID, and Transactional Consistency


Oct 20 Don't Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS

Scalable Consistency in Scatter

Oct 22 Project 3 due Sunday Night
Transactional storage for geo-replicated systems
Oct 27 Highly Available Transactions: Virtues and Limitations

(the above is prettier if you have a browser extension for markdown)

Oct 29 Pick two of three:
 Quantifying Eventual Consistency with PBS

 Bolt-on causal consistency

 Coordination Avoidance in Database Systems

pbs.md (I use Markdown Preview in Mac Chrome to view).

Nov 3 Spanner: Google's Globally-Distributed Database

off-topic: Towards an archival intermemory

Note that I've changed the intermemory paper. This is better than the prototype paper, but reading either is sufficient for this lecture.


Nov 5 Fault Tolerance and Security
Notes on fault tolerence in distributed, unreliable, asynchronous environments.


Nov 10 Crypto
Nov 12 Consensus


Nov 17 Read a few pages of The Part-Time Parliament, then go to wikipedia.

In search of an understandable consensus algorithm

There will be a quiz.

Very optional:
 Paxos Made Simple
 Paxos Made Practical
 Paxos Made Live: an Engineering Perspective
 Paxos Made Transparent
 Paxos Made Moderately Complex

Slides and Quizzes

Nov 19 Project 4 due Sunday Night
Practical Byzantine Fault Tolerance

Peer Review

Nov 24 Capability Myths Demolished

Chit-Based Access Control


Nov 26 Thanksgiving
Dec 1 Secure Untrusted Data Repository

SPORC: Group Collaboration using Untrusted Cloud Resources

Dec 3

The Case for RAMCloud
(optional: Log-structured Memory for DRAM-based Storage)


Dec 8 Perspective: Semantic data management for the home

Cimbiosys: A platform for content-based partial replication

Dec 10 Project 5 due Sunday Night

Extracting Guarantees from Chaos


ProjectDescriptionDue DateErrata
P-1In-memory file system9/10
P-2Persistence, and a Cloud Client10/4
P-3Versioning and Semantics File Systems10/25


Please read the statement on academic integrity.