Tech Trends For Fall Reading: Software Transactional Memory, Cloud Computing Storage, and more

Now that the summer is over – the tech industry is back to work – and the new products and service announcements are coming quick, why not do some good reading to prepare for the fall when everyone returns from vacation and you get back to the serious business of deadlines, programming and of course geeky arguments about the topics of the day. Here is a good reading list to bookmark.

Get up to speed of Generic Programming, or Programming In General

My first recommendation is the collected papers of Alexander Stepanov, which you can get from his website entitled… Collected Papers of Alexander Stepanov. For those who don’t know, Stepanov is the key person behind the C++ Standard Template Library, which he started to develop around 1993. He had earlier been working for Bell Labs close to Andrew Koenig and tried to convince Bjarne Stroustrup to introduce something like Ada Generics in C++. His papers are a treasure of thought on generic programming, logic, robotics and anything else that made you turn to the Computer Science page in your university’s catalog. Best of all he also provides slides for his book in progress, written with Paul McJones, called Programming Elements. This is a great book for refreshing your knowledge of abstract and concrete concepts in quick and easy powerpoint format. Just take a look at the table of contents and I dare you not to click on at least one of the Chapter links. Don’t worry, I won’t tell.

The “Core” Debate of the Community: Concurrent Programming and Software Transaction Memory

Yes, the pun was bad. It does however illustrate one of the facets of the problem that is burning up academic and commercial researchers alike, and responsible for a large amount of papers flooding the ACM portal: Software Transactional Memory (STM). Well, actually, that’s a possible answer to the problem – not the problem itself. They are often confused now. The problem is that since Intel and AMD have decided to start introducing more cores on to single chip we have to deal with the big problem that comes along with that: managing the threads of multiple cores trying to do the same work on behalf of the system it’s working for. It also scales in to bigger problems of any type of work you may want to farm off to “locales” that may need to cross boundaries and work on the same data within a transaction (for more information on some of this, see my post from the Google Scalability Conference regarding Cray’s work to replace MPI with a new concurrent language Chapel and the GIGA+ filesystem below)

I think Simon Peyton Jones from Microsoft Research in Cambridge illustrates it best in his paper Composable Memory Transactions(PPOPP’05) :

The dominant programming technique is based on locks, an approach that is simple and direct, but that simply does not scale with program size and complexity. To ensure correctness, programmers must identify which operations conflict; to ensure liveness, they must avoid introducing deadlock; to ensure good performance, they must balance the granularity at which locking is performed against the costs of ?ne-grain locking. Perhaps the most fundamental objection, though, is that lock-based programs do not compose: correct fragments may fail when combined. For example, consider a hash table with thread-safe insert and delete operations. Now suppose that we want to delete one item A from table t1, and insert it into table t2; but the intermediate state (in which neither table contains the item) must not be visible to other threads. Unless the implementor of the hash table anticipates this need, there is simply no way to satisfy this requirement. Even if she does, all she can do is expose methods such as LockTable and UnlockTable but as well as breaking the hash-table abstraction, they invite lock-induced deadlock, depending on the order in which the client takes the locks, or race conditions if the client forgets. Yet more complexity is required if the client wants to await the presence of A in t1, but this blocking behaviour must not lock the table (else A cannot be inserted). In short, operations that are individually correct (insert, delete) cannot be composed in to larger correct operations.

The most that has come out of this is that we know it’s a problem and we’d love to use the keyword “atomic” to wrap our transactional code in our languages. Beyond that, it’s a lot of hand waiving and Powerpoint slides. Some people though are actually trying to work it out. The best starting point here are the papers from the before mentioned researcher Simon Peyton Jones. His collection of papers on STM offers a good starting point of the problem and what some possible solutions are. In his papers he uses Haskell, and his work has led to Concurrent Haskell. Haskell lends itself to STM for reasons I won’t go in to here, but it will be quite a bit more of a challenge to get the same functionality in Java and C#, but there is already an API for C# Software Transactional Memory from Microsoft Research you may want to explore.

If you don’t care about this, just don’t go naming classes atomic and you should be fine.

Storing The Cloud: How Do We Scale?

Solid State (read: Flash) drives aren’t the only thing showing the age of our old file system technologies. As we expose software as services and begin taking on large numbers of tenants for our software, cloud computing needs clusters with thousands of nodes that, with the multi-core technology mentioned above, will impose a challenge for storage systems. We will need the ability to scale to handle data generated by applications executing in parallel in tens of thousands of threads. There have been some solutions posed, such as IBM General Parallel File System (GPFS) and Microsoft Research’s Boxwoodtechnology.

I was lucky enough to watch a presentation on GIGA+, another solution that is being researched by Swapnil V. Patil at Carnegie Mellon University. One of its neatest ideas is leaving the header-table behind, using a bitmap instead. I got to sit down with him afterward and talk about the challenges we face in this space. It was a great time. His primary concern about GPFS and Boxwood is the use of hashing and B-trees, which causes the possibility of bottlenecks and synchronization issues. By using a bitmap, and keeping it small so that it can be shared across nodes easily, GIGA+ eliminates a need for “metanodes” or other controllers on the HPC storage architecture.

His paper, GIGA+ : Scalable Directories for Shared File Systems, is a great read for those interested both in the problem of high-performance computing and storage. Their work seeks to maintain the UNIX file structure however, so those who care about scaling Microsoft infrastructure may find less to enjoy, but the overall architecture and problems outlined in the paper is applicable to any massively large storage cluster technology.

Enough Already

That should be enough to get you through August. When your boss comes back from his Alaskan cruise, nothing will ensure he leaves you alone more than talking about Concurrent Haskell or how much you enjoyed Chapter 9 of Programming Elements: Algorithms on increasing ranges. Enjoy the air conditioning you lucky bums.