Depot Daemon

From Ultimate Commander

Notice: This document is a work-in-progress.

Contents

Abstract

Depot intends to provide a new way for users to organize, interact with, and backup their documents in a much more efficient, flexible and rich manner.

Technically Depot is a system daemon that forms a layer above the filesystem and provides additional, advanced features described below.

As an important note, Depot does not intend to store system files. It intends to store documents users work with.

Depot also does not intend to store personal information space objects, like contacts, emails and IM logs, due to the amount of work and the platform-specific nature of this task.

Why file system is not enough?

I've been concerned with various structural storage problems in the past which affected my productivity negatively when interacting with a large number of files which came from various sources.

Some frustrations of mine related to file management:

  • I want to make a catalog of my files which are stored on my DVDs/CDs. I want this catalog to store every relevant metadata, such as file format specific metadata, the thumbnails of my images, selected screenshots of my movies.
  • I want a global namespace where I can store every online (on-disk) and offline (cataloged and backed up) files of mine. Offline files should be visualized differently and could be bring online in a very convenient way.
  • I want to edit the file format specific metadatas of my music files just like in the cells of a spreadsheet.
  • I want to structure my global namespace in finer-grained ways that the ordinary file system wouldn't make me able such as marking my directories that contain individual movies as Movie objects which have special fields such as "IMDB ID", "Title", or "Hungarian Title".
  • I want to assign various additional fields to these objects like rating my music files on a scale of 1 to 10.
  • When browsing any directory, I want to know which files I already have in my global namespace or my catalog.

Implementation

The following section describes the concepts which Depot is planned to built upon.

Objects

Every files and directories are objects in Depot.

Objects have the following qualities that make them superior to ordinary files:

Objects expose metadata

Ordinary files have very primitive structure. A file is 1) a sequence of bytes, 2) a fixed set of hardcoded, platform-specific metadatas, and 3) a set of extended attributes which are only supported by several systems.

Why is it important? Because this structure implies that file format specific metadatas must be stored in the file stream. This means that whenever an application wants to query or manipulate the metadatas of a file, it needs to use a library. Such libraries have diverse APIs, they are specific to one language or a set of languages and metadatas are not cached.

Depot provides a clean, consistent interface to query and manipulate object metadatas that is easy to wrap in any language. Depot also caches the metadata so metadata queries are exceptionally fast.

Objects are extensible

Some applications associate various metadatas to files. Examples of this: 1) a music player provides a way to rate the musics on a scale of 1 to 5 or 2) a photo manager application lets the user assign a set of categories to each pictures.

Application specific metadatas are stored in diverse, often obscure ways so sharing them across applications is typically hard or impossible. Depot provides a way for applications to store metadata in a standard manner that is accessable consistently, effortlessly to every applications.

Objects are indexed

Every object stored in Depot are indexed into a database. Therefore it is possible to issue complex search queries which provide accurate results in a very fast manner.

Examples of these kinds of searches are:

  • List movies that are longer than 4 minutes and 32 seconds and have the "fire" string in their title. Order them ascending by their release date.
  • List those pictues that feature Andy and Mary and have a resolution above 400x300.

Objects are signed

Every file objects stored in Depot are associated with their signature. A signature is a characteristic value that can be generated from any file in a very short amount of time. It is used to decide whether two files are identical or not. In practice different files with the same signature should be extremely rare.

In the current implementation the signature of a file is the MD5 sum of its middle 4096 bytes or the MD5 sum of the entire file if the file is shorter than 4097 bytes. This method proved to be really efficient and accurate.

Therefore it is very easy to query whether the files the user encounters with are duplicates or new files.

Classes

Every object is an instance of a class, therefore the structure of classes implicate the features of objects.

In the following section the structure of classes will be revised.

A class has a set of data members

A data member, or a member in short describes an attribute of a class.

Let's see a class definition example with various members:

class Movie:
    int imdb_id             // the Internet Movie Database identifier of the movie.
    string english_title    // the English title of the movie.
    string hungarian_title  // the Hungarian title of the movie.
    int year                // the year that the movie is released in.

This class describes a movie, and it has four members. You can also notice the type definitions before the names of the members. Every member must have a type which defines the set of values it can have.

Currently the following types are applicable:

  • boolean stores a logical value. It can either be true or false.
  • int stores a 32 bit long, signed integer value.
  • string stores a string of characters.
  • enum stores one value of a set of predefined constants.
  • set stores a set of values of a set of predefined constants.

A class has a set of interfaces

Interfaces provide a way to bind a set of data members to a set of classes.

Let's see an example interface definition:

interface RatingAndPace:
    int rating              // describes how much you like the object.
    enum pace {slow, fast}  // describes the pace of the object.

After defining such an interface, you can easily bind it to your Movie and Music classes so no duplicate member definitions are involved.

Classes are arranged into the class hierarchy

Every class is arranged into the class hierarchy.

The standard class hierarchy is shown below.

+-+ Object
  |
  +-+ File
  | |
  | +-+ TextFile
  | | |
  | | +-- DOCFile   // MIME:application/msword; extension:doc
  | | |
  | | +-- HTMLFile  // MIME:text/html extension:htm,html
  | | |
  | | +-- PDFFile   // MIME:application/pdf; extension:pdf
  | | |
  | | +-- PPTFile   // MIME:application/vnd.ms-powerpoint; extension:ppt
  | | |
  | | +-- PSFile    // MIME:application/postscript; extension:ps
  | | |
  | | +-- RTFFile   // MIME:application/rtf; extension:rtf
  | | |
  | | +-- XLSFile   // MIME:application/vnd.ms-excel; extension:xls
  | |
  | +-+ ImageFile
  | | |
  | | +-- GIFFile   // MIME:image/gif; extension:gif
  | | |  
  | | +-- JPGFile   // MIME:image/jpg; extension:jpg,jpeg
  | | |  
  | | +-- PNGFile   // MIME:image/png; extension:png
  | |
  | +-+ AudioFile
  | | |
  | | +-- MP3File   // MIME:application/ogg; extension:ogg
  | | |
  | | +-- OGGFile   // MIME:audio/mpeg; extension:mp3
  | |
  | +-+ VideoFile
  | | |
  | | +-- AVIFile   // MIME:video/avi; extension:avi
  | | |  
  | | +-- MOVFile   // MIME:video/quicktime; extension:mov
  | | |  
  | | +-- MPGFile   // MIME:video/mpeg; extension:mpg,mpeg
  | |
  | +-- UnknownFile
  |
  +-- Directory

The standard class hierarchy can be further extended by

  • adding a new member to a class,
  • adding a new interface to a class, and
  • subclassing a class.

A class can be subclassed by creating a new child class and referencing an existing (parent) class. This way the parent class is not modified, but in the child class it's data members will be available and the child can be extended with additional members.

Stores

A store is a repository that accounts every objects of a user or a group.

A store has 4 distinct, but related namespaces that make the above features possible:

The catalog namespace

The catalog namespace stores every objects that were cataloged from tertiary storage mediums, typically from DVDs/CDs. This namespace stores only the file-specific metadatas and signatures of the files that were cataloged, not the files themselves. Catalog stores the hash values used in the most popular file sharing networks per file for improved availability in case some tertiary storage mediums gets broken.

In the catalog namespace, objects have a special member with the following definition:

enum catalog_state { linked, unlinked, deprecated }

The above enum values have the following meaning:

  • linked: The object is linked into the depot namespace.
  • unlinked: The object is not linked into the depot namespace.
  • deprecated: The object is not linked into the depot namespace since it is deprecated and have been rendered irrelevant.

The depot namespace

The depot namespace stores every objects that are currently online.

In the depot namespace, objects have a special member with the following definition:

enum depot_state { online, offline, queued }

The above enum values have the following meaning:

  • online: The object is stored on the file system so it's immediately accessible and it's also backed up.
  • offline: The object is not stored on the file system, only on tertiary storage media which it is linked to through the catalog namespace.
  • queued: The object is stored on the file system so it's immediately accessible, but it's not backed up yet.

The classes namespace

The classes namespace stores the class hierarchy of the store as a tree of objects.

The interfaces namespace

The interfaces namespace stores the interfaces of a store as a tree of objects.

Related projects

Indexing services build on top of the file system

  • Beagle is a search tool that ransacks your personal information space to find whatever you're looking for. This project primarily targets Linux. It also uses Mono and has a usable codebase to borrow.
  • SpotLight is the search system of Mac OS X Tiger.
  • GNOME Storage is an exciting project to replace the traditional filesystem with a new document store. Unfortunately this project is very immature and dead.
  • DBFS is a faceted system that indexes your files to easily and efficiently find them. This project seems to be immature and unmaintained also.

Metadata file systems

  • WinFS is the next generation relational file system for Windows.

Metadata editors

  • Entagged is a poweful, multiplatform tag editor written in Java.
  • Tag&Rename is a very mature tagging software written for Windows.

Cataloging applications

  • WhereIsIt is a highly evolved cataloging software written for Windows.

Related libraries

  • entagged-sharp is a powerful library to extract audio metadata from files.
  • inotify is a recent service of the Linux kernel which makes one able to monitor file system changes realtime.
  • GStreamer is a sophisticated multimedia framework that could be used for making screenshots of movies.

Log in / create account
Google
 

This page was last modified on 9 March 2006, at 10:59.
Mono powered Powered by MediaWiki SourceForge.net Logo Valid XHTML 1.0 Transitional Valid CSS!