Big Problem

Brooks Davis brooks at one-eyed-alien.net
Wed Aug 4 09:49:55 PDT 2004


On Wed, Aug 04, 2004 at 07:34:31PM +0800, Kathy Quinlan wrote:
> Hi Guys and Gals,
> 
> First off I am not a troll, this is a serious email. I can not go into 
> to many fine points as I am bound by an NDA.
> 
> The problem:
> 
> I need to hold a text file in ram, the text file in the forseable future 
> could be up to 10TB in size.
> 
> My Options:
> 
> Design a computer (probably multiple AMD 64's) to handle 10TB of memory 
> (+ a few extra Gb of ram for system overhead) and hold the file in one 
> physical computer system.
> 
> Build a server farm and have each server hold a portion eg 4GB each 
> Server (250 servers (plus a few extra for system overhead)

That only gets you to 1TB...

> The reason the file needs to be in ram is that I need speed of search 
> for paterns in the data (less than 1 second to pull out relevent chunks)
>
> I am sure I have missed some options, right now I am just kicking ideas 
> around, the software will be based on FreeBSD with some major 
> modifications to address the large amount of ram (probably set it up as 
> a virtual drive with one file)

Depending on your budget, I'd either give Cray or SGI a call, or build
a cluster of AMD64 machines.

You can get 16GB in a 1U chassis so that would reduce your requirements
to around 700 machines, call it 18 racks minus the networking.  You will
not be able to use that as a ram disk and stripe it for a single machine
to search.  First, there's no way you'll be able to maintain any kind
of uptime if you do that.  With 5600 DIMMs, you'll lose at least one a
week, probably more.  Second, assuming you can completely process one
64-bit word per cycle and you had enough bandwidth you would need 625
seconds to process 10TB of data.  What you will need to do is build a
distributed application that a) allows processing to run on each machine
and b) provides a mechanism for fault tolerance in the face of machine
failures.

You would do well to read up on the techniques used by google to manage
unreliable systems and provide high-performance search.

-- Brooks

-- 
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hardware/attachments/20040804/d896b3d9/attachment.bin


More information about the freebsd-hardware mailing list