Notice: This document is a work-in-progress.
Contents |
Depot intends to provide a new way for users to organize, interact with, and backup their documents in a much more efficient, flexible and rich manner.
Technically Depot is a system daemon that forms a layer above the filesystem and provides additional, advanced features described below.
As an important note, Depot does not intend to store system files. It intends to store documents users work with.
Depot also does not intend to store personal information space objects, like contacts, emails and IM logs, due to the amount of work and the platform-specific nature of this task.
I've been concerned with various structural storage problems in the past which affected my productivity negatively when interacting with a large number of files which came from various sources.
Some frustrations of mine related to file management:
The following section describes the concepts which Depot is planned to built upon.
Every files and directories are objects in Depot.
Objects have the following qualities that make them superior to ordinary files:
Ordinary files have very primitive structure. A file is 1) a sequence of bytes, 2) a fixed set of hardcoded, platform-specific metadatas, and 3) a set of extended attributes which are only supported by several systems.
Why is it important? Because this structure implies that file format specific metadatas must be stored in the file stream. This means that whenever an application wants to query or manipulate the metadatas of a file, it needs to use a library. Such libraries have diverse APIs, they are specific to one language or a set of languages and metadatas are not cached.
Depot provides a clean, consistent interface to query and manipulate object metadatas that is easy to wrap in any language. Depot also caches the metadata so metadata queries are exceptionally fast.
Some applications associate various metadatas to files. Examples of this: 1) a music player provides a way to rate the musics on a scale of 1 to 5 or 2) a photo manager application lets the user assign a set of categories to each pictures.
Application specific metadatas are stored in diverse, often obscure ways so sharing them across applications is typically hard or impossible. Depot provides a way for applications to store metadata in a standard manner that is accessable consistently, effortlessly to every applications.
Every object stored in Depot are indexed into a database. Therefore it is possible to issue complex search queries which provide accurate results in a very fast manner.
Examples of these kinds of searches are:
Every file objects stored in Depot are associated with their signature. A signature is a characteristic value that can be generated from any file in a very short amount of time. It is used to decide whether two files are identical or not. In practice different files with the same signature should be extremely rare.
In the current implementation the signature of a file is the MD5 sum of its middle 4096 bytes or the MD5 sum of the entire file if the file is shorter than 4097 bytes. This method proved to be really efficient and accurate.
Therefore it is very easy to query whether the files the user encounters with are duplicates or new files.
Every object is an instance of a class, therefore the structure of classes implicate the features of objects.
In the following section the structure of classes will be revised.
A data member, or a member in short describes an attribute of a class.
Let's see a class definition example with various members:
class Movie:
int imdb_id // the Internet Movie Database identifier of the movie.
string english_title // the English title of the movie.
string hungarian_title // the Hungarian title of the movie.
int year // the year that the movie is released in.
This class describes a movie, and it has four members. You can also notice the type definitions before the names of the members. Every member must have a type which defines the set of values it can have.
Currently the following types are applicable:
Interfaces provide a way to bind a set of data members to a set of classes.
Let's see an example interface definition:
interface RatingAndPace:
int rating // describes how much you like the object.
enum pace {slow, fast} // describes the pace of the object.
After defining such an interface, you can easily bind it to your Movie and Music classes so no duplicate member definitions are involved.
Every class is arranged into the class hierarchy.
The standard class hierarchy is shown below.
+-+ Object | +-+ File | | | +-+ TextFile | | | | | +-- DOCFile // MIME:application/msword; extension:doc | | | | | +-- HTMLFile // MIME:text/html extension:htm,html | | | | | +-- PDFFile // MIME:application/pdf; extension:pdf | | | | | +-- PPTFile // MIME:application/vnd.ms-powerpoint; extension:ppt | | | | | +-- PSFile // MIME:application/postscript; extension:ps | | | | | +-- RTFFile // MIME:application/rtf; extension:rtf | | | | | +-- XLSFile // MIME:application/vnd.ms-excel; extension:xls | | | +-+ ImageFile | | | | | +-- GIFFile // MIME:image/gif; extension:gif | | | | | +-- JPGFile // MIME:image/jpg; extension:jpg,jpeg | | | | | +-- PNGFile // MIME:image/png; extension:png | | | +-+ AudioFile | | | | | +-- MP3File // MIME:application/ogg; extension:ogg | | | | | +-- OGGFile // MIME:audio/mpeg; extension:mp3 | | | +-+ VideoFile | | | | | +-- AVIFile // MIME:video/avi; extension:avi | | | | | +-- MOVFile // MIME:video/quicktime; extension:mov | | | | | +-- MPGFile // MIME:video/mpeg; extension:mpg,mpeg | | | +-- UnknownFile | +-- Directory
The standard class hierarchy can be further extended by
A class can be subclassed by creating a new child class and referencing an existing (parent) class. This way the parent class is not modified, but in the child class it's data members will be available and the child can be extended with additional members.
A store is a repository that accounts every objects of a user or a group.
A store has 4 distinct, but related namespaces that make the above features possible:
The catalog namespace stores every objects that were cataloged from tertiary storage mediums, typically from DVDs/CDs. This namespace stores only the file-specific metadatas and signatures of the files that were cataloged, not the files themselves. Catalog stores the hash values used in the most popular file sharing networks per file for improved availability in case some tertiary storage mediums gets broken.
In the catalog namespace, objects have a special member with the following definition:
enum catalog_state { linked, unlinked, deprecated }
The above enum values have the following meaning:
The depot namespace stores every objects that are currently online.
In the depot namespace, objects have a special member with the following definition:
enum depot_state { online, offline, queued }
The above enum values have the following meaning:
The classes namespace stores the class hierarchy of the store as a tree of objects.
The interfaces namespace stores the interfaces of a store as a tree of objects.