34th International Conference
on Massive Storage Systems
and Technology (MSST 2018)
May 14 — 16, 2018

Sponsored by Santa Clara University,
School of Engineering


Since the conference was founded by the leading national laboratories, MSST has been a venue for massive-scale storage system designers and implementers, storage architects, researchers, and vendors to share best practices and discuss building and securing the world's largest storage systems for high-performance computing, web-scale systems, and enterprises.
    



Hosted at
Santa Clara University
Santa Clara, CA


2018 Conference


This year, MSST will focus on distributed storage system technologies, including persistent memory, long-term data retention (tape, optical disks...), solid state storage (flash, MRAM, RRAM...), software-defined storage, OS- and file-system technologies, cloud storage, big data, and data centers (private and public). The conference will focus on current challenges and future trends in storage technologies.

MSST 2018 will include a day of tutorials and two days of invited papers. The conference will be held, once again, on the beautiful campus of Santa Clara University, in the heart of Silicon Valley. Registration information is below.

Access the 2017 program, with links to papers, presentation slides, and video here.

Santa Clara University


— 2018 Registration —

(Register for one or both tracks.)
Tutorial
Invited Track
  $70.00
$140.00
(1 day)  
(2 days)
Register Here

Note: Early registration ends May 7th.


Logistics


Venue: Locatelli Center on the Santa Clara University Campus (map)

Parking: Daily and multi-day permits are available for purchase at the 
main gate at 500 El Camino Real ($8/day, where the attendant will direct you to the
Locatelli Center/Leavey Center parking lot), or daily permits may be
purchased for $5 at an unmanned kiosk in the parking lot. (map)         

Driving Directions (to the campus)

Walking Directions (on campus)

Hotels near the campus
(To reduce your attendance fees, there is no
"conference hotel", so you can choose where to stay.)



Subscribe to our email list for (infrequent) information along the way.



2018 Program

Tutorial, Monday, May 14th
9:00am — 1:00pm

LOCKSS (Lots of Copies Keeps Stuff Safe)
Workshop on Distributed Digital Preservation


This workshop will focus on:

  1. Trends in digital preservation
  2. Information resources
  3. Distributed digital preservation concepts
  4. LOCKSS preservation features and storage framework
  5. The methodology of the LOCKSS open source software re-architecture
  6. Open discussion: Opportunities for technology collaboration

LOCKSS Background:


LOCKSS is a leading technology for peer-to-peer distributed digital preservation. LOCKSS Java software is going through a major update and revision that will affect existing LOCKSS users and also future community innovators. Originally designed and launched in the late 1990s, LOCKSS was utilized in the library community to help maintain and preserve eBooks and eJournals. The present software re-design will change not only the focus and capabilities of the LOCKSS technology, but also the collaborative partnerships, support structure, and market focus. It will make LOCKSS technology applicable to new types of users, integrations, and content.

Instructors:


Thib Guicherd-Callin, LOCKSS Development Manager, Stanford Libraries (bio)

Nicholas Taylor, Program Manager, LOCKSS and Web Archiving, Stanford Libraries (bio)

Art Pasquinelli, LOCKSS Partnership Manager, Stanford Libraries, Preservation and
Archiving Special Interest Group (PASIG) Steering Committee (bio)


Tutorial, Monday, May 14th
2:00pm — 6:00pm

Big Data for Big Problems
(and the technology that underpins it)


This tutorial will present a Big Data Analytics project as a case study in handling big data and the software technology that is used to construct it. The platform is the key component of a comprehensive research program at Georgetown University to enable transformative multidisciplinary integrative research. Research methodologies and computational techniques to model, process, and analyze massive amounts of data efficiently, while assuring privacy. Target data technologies that will be covered include big data, data privacy, and high-assurance systems.

This class is for data scientists, system engineers and technical team managers.

Tutorial objectives:

Class Structure:

  1. Overview of the architecture
  2. The Toolkit
  3. The Visualization Utility (Advance VU)
  4. The ontology and taxonomies
  5. The road-map for the future, questions, and discussion

Instructors:


Norman R. Kraft, Georgetown University

Helen E. Karn, Georgetown University

Stephen Baird, AdaCore


Invited Track, Tuesday, May 15th
8:00 — 9:00 Registration / Breakfast
Welcome Address
Dr. Alfonso Ortega, Dean, School of Engineering, Santa Clara University
Keynote
Persistent Memory, NVM Programming Model and NVDIMMS
Dr. Thomas Coughlin, President, Coughlin Associates
Extreme Scale Blocks, Files, and Backup
Extreme-scale Block: How to Make Scalable Block Protocol Systems
Josh Goldenhar, VP, Excelero
Extreme-scale File: How to Massively Scale Cloud File Name Spaces
David Payne, VP, Elastifile
Extreme-scale Backup
Gleb Budman, CEO, BackBlaze
12:30 — 1:30 Lunch
Non-Volatile Memory API
Chair: Glen Lockwood
Programming Models for Accessing NVM Over RDMA
Megan Grodowitz, Arm
Application programming models for using RDMA capable networks have begun to incorporate support for persistent memory. Because RDMA APIs have been designed under the assumption that the remote data being accessed is, in fact, byte addressable memory, the facility to use NVM as storage (or something storage-like) opens up new challenges and opportunities. This talk will give a comparison and overview of the current state of libraries and programming models for NVM access over RDMA networks.
Incorporating NVM into Data-Intensive Scientific Computing
Dr. Philip Carns, Argonne National Laboratory
Two concurrent trends are motivating the HPC community to rethink scientific data service architectures: the emergence of NVM devices with radically different performance characteristics, and a growing interest in specialized data services that provide performance, convenience, or features beyond those of a conventional file system. The convergence of these trends will bring about a fundamental shift in the productivity of data-intensive scientific computing, but only if we capitalize on NVM characteristics through the use of efficient, portable, and flexible interfaces that complement HPC network and CPU capabilities. This talk will highlight those challenges from an HPC perspective and discuss how the state of the practice can be adapted to meet them.
Programming with Persistent Fabric-Attached Memory
Dr. Kimberly Keeton, Hewlett Packard Enterprise
Recent technology advances in high-density, byte-addressable non-volatile memory (NVM) and low-latency interconnects have enabled building large-scale systems with a large disaggregated fabric-attached memory (FAM) pool shared across heterogeneous and decentralized compute nodes. In this model, compute nodes are decoupled from FAM, which allows separate evolution and scaling of processing and fabric-attached memory. The large capacity of the FAM pool means that large working sets can be maintained as in-memory data structures. The fact that all compute nodes share a common view of memory means that data sharing and communication may be done efficiently through shared memory, without requiring explicit messages to be sent over heavyweight network protocol stacks. Additionally, data sets no longer need to be partitioned between compute nodes, as is typically done in clustered environments. Any compute node can operate on any data item, which enables more dynamic and flexible load balancing.

This talk will describe the OpenFAM API, an API for programming with persistent FAM that is inspired by partitioned global address space (PGAS) models. Unlike traditional PGAS models, where each node contributes local memory toward a logically shared global address space, FAM isn't associated with a particular node and can be addressed directly from any node without the cooperation or involvement of another node. The OpenFAM API enables programmers to manage memory allocations, access FAM-resident data structures, and order FAM operations. Because state in FAM can survive program termination, the API also provides interfaces for naming and managing data beyond the lifetime of a single program invocation.
APIs for Persistent Memory Programming
Andy Rudoff, Intel
This talk will cover the current state of persistent memory APIs available on various operating systems. It will describe the low-level APIs provided by the operating system vendors, as well as higher level libraries and language support, covering a variety of use cases for persistent memory.
Short Talks
Attendees and vendors can sign up in advance, or at the conference, to give 5-15 minute
works-in-progress or summary updates on work of interest to conference attendees.
Modernizing Xroot Protocol
Dr. Michal Kamil Simon, CERN


Invited Track, Wednesday May 16th
8:00 — 9:00 Breakfast
Keynote
Memory Technologies and the Evolution of Distributed Storage Software
Dr. Peter Braam, Campaign Storage, LLC (bio)
A whirlwind of new memory and storage devices has begun and will continue to change data centers. Handling multiple storage tiers, performance aligned with that of memory, and new consistency models illustrate the breadth of new requirements for storage software. In the context of large-scale HPC, we will review how storage systems have changed and overcome many difficulties. From there we proceed to look at what is planned and anticipated going forward, indicating roles for technologies such as containers, file systems, object storage and access libraries. This is an area with many exciting opportunities and presently only a handful of solutions are available or under development.
Managing Extreme Scale Storage Environments
Scale Challenges of the MeerKAT Radio Telescope
Dr. Simon Ratcliffe, Square Kilometre Array
Building Extreme-Scale File Name Spaces in the Oracle Public Cloud
Ed Beauvais, Oracle
HPC Storage, Machine Learning and Adaptation with Respect to Job Scheduling
Alan Poston, Hewlett Packard Enterprise
The Medium-Term Prospects For Long-Term Storage
David Rosenthal
At scale storage is organized as a hierarchy, with small amounts of "hot" storage at the top, and large amounts of "cold" storage at the bottom. The hot layers have been evolving rapidly as flash displaces spinning disk; the cold layers, not so much. Will this change in the medium term? What are the factors driving this part of the storage market?
12:30 — 1:30 Lunch
Panel: Metadata Management at Scale
Chair: Wendy Poole
Managing Lustre Metadata with HDFS
Aaron Steichen, ExxonMobil Technical Computing Company
HPC applications generate massive amounts of files and data. This presents challenges for managing the file systems. The MySQL-based Robinhood Policy Engine is not an ideal fit for custom queries and becomes harder to rely on as inode count increases. We wanted a quicker, more flexible way to analyze the data on our Lustre file systems. We attempted to replace Robinhood's functionality with an HDFS solution. We found that while HDFS was not a good fit for the depth first searches required to rebuild file paths, the custom queries ran orders of magnitude faster in HDFS once the paths were pre-generated in MySQL. The increased speed of the queries and flexibility of the data structure has allowed us to run queries that were not feasible before putting the data in HDFS.
JGI Archive and Metadata Organizer (JAMO)
Chris Beecroft, Lawrence Berkeley Laboratory (bio)
The Department of Energy Joint Genome Institute (JGI) is a national user facility that generates petabytes of data from instruments and analysis. Over the 2000-2018 timeframe, the JGI has experienced exponential growth in data generation. In 2013 the JGI deployed a hierarchical data management system to handle this data deluge. This system called the JGI Archive and Metadata Organizer (JAMO) enables JGI staff and scientists to write pipelines that automatically associate the files generated from instruments and analysis pipelines with a rich set of metadata. The JAMO system has saved JGI countless FTE hours that were historically spent trying to locate data on various storage systems for sharing internally or with collaborators. In this talk I will provide a high level overview of the system and how it was deployed at the JGI.
Metadata in Feature Animation Film Production
Scott Miller, Dreamworks (bio)
Visual complexity, audience expectation and competition for eyeballs is increasing. MetaData and analytics are driving efficiency in character and environmental design, overall film design, application implementation, resource scheduling and workflow management to help create even more compelling Feature Animated films than before. This talk provides a brief glimpse into the film making process and how metadata is making a difference.
Massive Scale Metadata Efforts and Solutions
Dave Bonnie, Los Alamos National Laboratory (bio)
This talk will provide an overview of scalable metadata management solutions in research, development, and production at Los Alamos National Lab. The talk will target three primary efforts. The Grand Unified File Indexing (GUFI) System is a hybrid indexing capability using both file system trees and embedded SQL to enable a fast and efficient file metadata indexing system that can be used by both system administrators and users due to its unique approach to securing access to the index. Delta-FS is a user space namespace that utilizes concepts from git to allow applications to "check out" name spaces and "merge" name space changes enabling namespace operations to scale with the application. The Hexa-Dimensional Hashing Indexing Middleware (HXHIM) system is a user space, linkable, parallel, many dimensional key value store framework that allows applications to plug in their favorite multi-dimensional key value store/data base in and have hundreds to thousands of copies instantiated in parallel to form a distributed/parallel multi-dimensional indexing capability.
Write-Optimization for Metadata
Dr. Rob Johnson, Stony Brook University (bio)
This talk will describe how BetrFS, a file system built from the ground up on write-optimized data structures, uses write-optimization to accelerate file-system metadata operations, such as atime updates and file and directory creations, renames, and deletes. BetrFS offers orders-of-magnitude performance improvement on some of these operations compared to convention "update-in-place" file systems, without suffering from the fragmentation challenges of log-structured file systems.

The talk will also discuss challenges and opportunities for using write-optimization in file systems to accelerate application-level metadata maintenance.
Short Talks
Attendees and vendors can sign up in advance, or at the conference, to give 5-15 minute
works-in-progress or summary updates on work of interest to conference attendees.
DNA's Niche in the Storage Market
David Rosenthal
DNA has many attractive properties as an archival storage medium, being extremely dense, very stable in shirt-sleeve environments, and very cheap to replicate so Lots Of Copies Keep Stuff Safe. Since 2012, both the popular and technical presses have hyped the various lab demonstrations of writing and reading data using DNA as a medium. How does this technology work? What needs to happen to move it from the labs to the market?
File Transfer Service at Exabyte Scale
Dr. Michal Kamil Simon, CERN


2018 Organizers
Conference Co-Chairs     Dr. Ahmed Amer,  Dr. Sam Coleman
Program Committee     Gary Grider, Dr. Matthew O'Keefe, Arthur Pasquinelli, Gaz Salih
Industry Chair     Arthur Pasquinelli
Communications Chair     Meghan Wingate McClelland
Registration Chair     Yi Fang


Page Updated April 22, 2018