The InTransit Filesystem
A Long time ago I was watching a story about data and one fact that rang in my mind was how much data was in transit at anyone point in time. A classic way to think about this is a transatlantic fibre cable. Even at the speed of light, the amount of data that is bouncing in that cable grows as the distance increases. It occurred to me you could actually use the in transit bandwidth as a filesystem.
There’s a number of reasons you might want to do this?
- A really secure filesystem – since you have to continually retransmit the data, you simply unplug your PC and the data disappears
- A common storage system that multiple people can use
- A Storage system that grows as the amount of diverse paths increases
Below are some notes I made on a recent international flight.. at the time I considered myself as the in transit filesystem!!
The filesystem to this day has not been implemented though the thought about finishing the design and building it still lingers in some parts of my brain.
—-
The In Transit Filesystem
Concept: A file system that exists in the propagation delay of a transit media
Examples: There exists many international fibre cables. These cables can
transmit large amounts of data in a relatively short amount of time. If a cable
is capable of 1Tb/s and there is a latency of 1ms to the other end of the cable
then there can be approximately 1Tb/0.001 bits of data in transit at any one
time. The aim of the intransit filesystem is to store data in this transmission
space. Constant retransmission of data maintains the live filesystem. If the
network goes down or the data is not retransmitted the filesystem goes down/is
lost.
Why: Reason behind the In Transit Filesystem
The In Transit filesystem does away with a physical storage medium attached
to any one machine. Instead the storage medium is simply propagation delay.
This sort of storage has a number of applications. These include security
related keys which exist with a limited time. Ie consider the concept of
email that decays. The keys to the email exist in the InTransit Filesystem
The email client ask for the keys and if available decodes the message. The
sender is the pingback for the filesystem hence the sender decides if they
retransmit the key or not. If they don’t the email is now undecodable (easily
anyway). The filesystem can also be used to establish common tables. Consider a
router that needs to build mac tables. Instead of sending a broadcast arp
request to find a ip for a machine, it can query the in transit filesystem, this
not only allows quicker recovery of the ip but also reduces traffic.
Objective: The aim of this brief is to asses the requirements needed to not only
build the In Transit Filesystem but also to make it resilent.
Problems & Solutions:
There is a number of issues when building a filesystem of this type. These are
listed below and discussed afterwards with possible solutions.
o Calculating the Filesystem Size
o Latency to packets/files in read/writing
o Dealing with packet drop
o Retransmission
o Error checking and redundancy
o transport medium
The problems discussed:
o Calculating filesystem size
With the storage medium live the filesystem size is not only hard to calculate
but may be continually changing. The aim is to be able to guarantee a minimum
file system size so data can be recovered.
A ongoing calculation supports this by calculating packet latency over time. A
simple average gives an upper limit to the available data space. Of course
adding another route gives the filesystem increased space. For advanced
filesystem calculations packet loss including RED and similar algorithms, window
sizing and other medium fallbacks must be considered. These algorithms are not
yet determined.
o Latency to packets/files in read/writing
The speed of the filesystem is limited to at most 2*the propagation delay of the
medium link. This delay can often be up to 10 seconds ( on large long haul links
with slow transmission speeds or congestion. This delay is unacceptable. To
increase speed local caching can be applied though really converts the
filesystem from an in transit system to an in-memory system. Delays can be also
reduced by secondary links and forward propagation, ie splitting up the blocks
so a file is transmitted in a fragmented with multiple packets containing the
same part of a file. Of cause this reduces the available size of the filesystem.
o Dealing with Packet Drop
Packet drop in the in Transit Filesystem is not just bad but devastating. The
packet without some sort of redundancy will cause filesystem corruptions. To deal
with this there has to be redundancy.
o Error checking and redundancy
Whilst initially the filesystem will be implemented with traditional algorithms
to support redundancy, ie md5, forward, backward crc’s, etc long term the use of
such files are not scalable. Data is growing at such a large rate. Much which
doesn’t need to be archived but is anyway. We take an approach that is used in
the zfs filesystem, that being block deduping. Ie files with a block of data that
is the same need only store that block once.
We also introduce a new concept that the current modern age of filesystems
doesn’t support. That being guaranteed reliability. Whilst the computer age has
brought us digital data, the laying to store that data relies on consistency at
the filesystem level. Ie A file is always 100% intact. If any bits of that file
are not intact bad things happen. This places lots of constraints on the
filesystem in regard to the management of data. Whilst I don’t believe
filesystems can’t have reliability, I believe that applications can perform some
redundancies of their own formats. They then ask the filesystem for a guarantee
of a level of consistency to a file. For example: Modern day application
requests the creation of a file with a guaranteed consistency of 100%.
Application X (which say stores a lossy image format) requests block x of file
to be 100% guaranteed (ie a file header) then places a 70% guarantee request on
all other blocks (image data). Hence if data is missing of the filesystem
begins to fill up the filesystem has the option of dropping packets to
accommodate expansion.
Sam said,
This is quite similar to the Acoustic Delay Line Memory some early computers used around 1950, but on a much larger scale.
Add A Comment