Reuven Cohen's recent blog post asks "how can we avoid the full system reboot" for cloud infrastructure such as recently happened do to a bug in Amazon's S3 Gossip Protocol.
Inter machine "gossiping" is typically done with homegrown, open source, or commercial RPC or publish/subscribe message queue middleware. Facebook uses Thrift, Google uses Protocol Buffers w/ a homegrown RPC, eBay uses TIBCO Rendezvous, others try and get by with Apache ActiveMQ or something similar in their favorite language (such as Starling for the Ruby crowd). Some people prefer to allow "gossiping" at a higher level of abstraction such as ESBs, and Data Grids.
So how to avoiding the full system reboot problem?
A good start would be to use a highly tested and widely implemented infrastructure and not build it yet again from scratch. XMPP is an interesting candidate for that very reason. I believe that the really large scale, low latency middleware and hardware in use in the financial services markets would represent an even more battle tested "gossip" infrastructure. Amazon, however, was using a homegrown system, and there will always be unexpected corner cases that get debugged in production when the system is a work in progress.
Gossip-as-a-Service anyone? And, no you can't build it on top of Amazon SQS.
Thursday, July 31, 2008
Subscribe to:
Post Comments (Atom)

2 comments:
Concerning the Amazon outage with their homegrown middleware: why not use the platform independent Entera/NXTera middleware for gossiping? It has been in use for years at banks, insurance, aerospace, oil & gas and retail companies. It is so good companies don't realize they have it.
I'm not familiar with Entera/NXTera but I will check it out.
When I wrote the blog posting I was thinking of Message Oriented Middleware such as Talarian SmartSockets, SonicMQ, TIBCO Rendezvous, or the Low Latency Messaging products like 29West, IBM MQ LLM, Wombat MOMA and even the hardware messaging vendors like Tervela and Solace Systems which could definitely handle the large scale of cloud the size of Amazon.
Post a Comment