Storage model

Documentation of the PersonalMediaVault storage model.

This is the documentation for the vault storage model, including the types of files it uses, their internal structure and the encryption algorithms used.

Use this document as reference for any software development that requires interaction with the vault files.

File types

The vault storage model uses different types of files:

  • Lock file: File used to prevent multiple instances of PersonalMediaVault accessing the same vault.
  • Unencrypted JSON files: Configuration files that do not contain any protected vault data.
  • Encrypted JSON files: Used to store metadata.
  • Index files: Used to store lists of media asset IDs, in order to make searching faster.
  • Encrypted assets: Encrypted files containing the media assets. They can be single-file or multi-file.

Lock file

The lock file has the .lock extension.

It stores in plain text, a decimal number representing the PID of the current Process accessing the vault.

PersonalMediaVault backend should check for the existence of this file and the process before accessing the vault.

Unencrypted JSON files

Unencrypted JSON files have the .json extension.

They follow the JSON format. The schema varies depending on the specific file.

Since they are not encrypted, they just store configuration, like the port it should listen, or the encryption parameters.

Encrypted JSON files

Encrypted JSON files have the .pmv extension.

They take as a base a JSON plaintext, that is encrypted using an algorithm like AES.

They are binary files, with the following structure:

Starting byteSize (bytes)Value nameDescription
02Algorithm IDIdentifier of the algorithm, stored as a Big Endian unsigned integer
2HHeaderHeader containing any parameters required by the encryption algorithm. The size depends on the algorithm used.
2 + HNBodyBody containing the raw encrypted data. The size depends on the initial unencrypted data and algorithm used.

The system is flexible enough to allow multiple encryption algorithms. Currently, there are 2 supported ones:

  • AES256_FLAT: ID = 1, Uses ZLIB (RFC 1950) to compress the data, and then uses AES with a key of 256 bits to encrypt the data, CBC as the mode of operation and an IV of 128 bits. This algorithm uses a header containing the following fields:
Starting byteSize (bytes)Value nameDescription
2 + H4Compressed plaintext sizeSize of the compressed plaintext, in bytes, used to remove padding
2 + H + 416IVInitialization vector for AES_256_CBC algorithm
  • AES256_FLAT: ID = 2, Uses AES with a key of 256 bits to encrypt the data, CBC as the mode of operation and an IV of 128 bits. This algorithm uses a header containing the following fields:
Starting byteSize (bytes)Value nameDescription
2 + H4Plaintext sizeSize of the plaintext, in bytes, used to remove padding
2 + H + 416IVInitialization vector for AES_256_CBC algorithm

Index files

Index files have the .index extension.

They are sorted lists of media assets identifiers. They can store all the existing identifiers, or a fraction of them, for example, for a tag.

Thanks to being sorted, searching for a specific identifier can be achieved using binary search.

They are binary files, consisting of the following fields:

Starting byteSize (bytes)Value nameDescription
08Index sizeNumber of entries the index file contains, stored as a Big Endian unsigned integer
8 + 8*K8Media asset identifierEach media asset identifier is stored as a Big Endian unsigned integer. They are stored next to each other, and already sorted from lower value to grater value

Encrypted assets

Encrypted assets have the .pma extension.

They stored one or multiple encrypted files.

They are also binary files, and they can be of two types:

Single-File encrypted assets

These asset files are used to store a single and possibly big file in chunks, encrypted each chunk using the same method described by the Encrypted JSON files section.

They are binary files consisting of 3 contiguous sections: The header, the chunk index and the encrypted chunks.

The header contains the following fields:

Starting byteSize (bytes)Value nameDescription
08File sizeSize of the original file, in bytes, stored as a Big Endian unsigned integer
88Chunk size limitMax size of a chunk, in bytes, stored as a Big Endian unsigned integer

After the header, the chunk index is stored. For each chunk the file was split into, the chunk index will store a metadata entry, withe the following fields:

Starting byteSize (bytes)Value nameDescription
08Chunk pointerStarting byte of the chunk, stored as a Big Endian unsigned integer
88Chunk sizeSize of the chunk, in bytes, stored as a Big Endian unsigned integer

After the chunk index, the encrypted chunks are stored following the same structure described in the Encrypted JSON files section.

This chunked structure allows to randomly access any point in the file as a low cost, since you don’t need to decrypt the entire file, only the corresponding chunks. This capability is specially great for video rewinding and seeking.

Multi-File encrypted assets

These asset files are used to store multiple smaller files, meant to be sorted and accessed by an index number.

They are binary files consisting of 3 contiguous sections: The header, the file table and the encrypted files.

The header contains the following fields:

Starting byteSize (bytes)Value nameDescription
08File countNumber of files stored by the asset, stored as a Big Endian unsigned integer

After the header, a file table is stored. For each file stored by the asset, a metadata entry is stored, with the following fields:

Starting byteSize (bytes)Value nameDescription
08File data pointerStarting byte of the file encrypted data, stored as a Big Endian unsigned integer
88File sizeSize of the encrypted file, in bytes, stored as a Big Endian unsigned integer

After the file table, each file is stored following the same structure described in the Encrypted JSON files section.

This format is useful to store video previews, without the need to use too many files.

Vault folder structure

Media vaults are stored in folders. A vault folder may contain the following files and folders:

NamePathTypeDescription
Media assetsmediaFolderFolder where media assets are stored.
Tag indexestagsFolderFolder where tag indexes are stored.
Lock filevault.lockLock fileFile used to prevent multiple instances of the PersonalMediaVault backend to access a vault at the same time. It may not be present, in case the vault is not being accessed.
Credentials filecredentials.jsonUnencrypted JSON fileFile to store the existing accounts, along with the hashed credentials and the encrypted vault key, protected with the account password.
Media ID trackermedia_ids.jsonUnencrypted JSON fileFile to store the last used media asset ID.
Tasks trackertasks.jsonUnencrypted JSON fileFile used to store the last used task ID, along with the list of pending tasks.
Albumsalbums.pmvEncrypted JSON fileFile used to store the existing albums, including the metadata and the list of media assets included in them.
Tag listtag_list.pmvEncrypted JSON fileFile to store the metadata of the existing vault tags
User configurationuser_config.pmvEncrypted JSON fileFile to store user configuration, like the vault title or the encoding parameters
Main indexmain.indexIndex fileFile to index every single media asset existing in the vault.

Media assets folder

The media assets are stored inside the media folder.

In order to prevent the folder size to increase too much, the assets are distributed evenly in 256 sub-folders. The sub-folder name for each media asset is calculated from its identifier, since it’s a 64 bit unsigned integer, the folder name is the identifier module 256, and the result turned into a 2 character hex lowercased string

Examples: 00, 01, 02…, fd, fe, ff.

Inside each subfolder, the assets are stored inside their own folders, named by turning their identifier into a decimal string. Examples:

  • media_id=0 - Stored inside {VAULT_FOLDER}/media/00/0
  • media_id=15 - Stored inside {VAULT_FOLDER}/media/0f/15
import (
    "fmt",
    "hex", 
    "path",
)

func GetMediaAssetFolder(vault_path string, media_id uint64) string {
    subFolderName := hex.EncodeToString([]byte{ byte(media_id % 256) });

    return path.Join(vault_path, "media", subFolderName, fmt.Sprint(media_id))
}

The media asset folder may contain up to 3 types of files:

Media asset metadata file

Each media asset folder must contain a file named meta.pmv, being an encrypted JSON file containing the metadata of the media asset.

The file contains the following fields:

Field nameTypeDescription
idNumber (64 bit unsigned integer)Media asset identifier
typeNumber (8 bit unsigned integer)Media type. Can be: 1 (Image), 2 (Video / Animation) or 3 (Audio / Sound)
titleStringTitle
descriptionStringDescription
tagsArray<Number (64 bit unsigned integer)>List of tags for the media. Only identifiers are stored
durationNumber (Floating point)Duration of the media in seconds
widthNumber (32 bit unsigned integer)Width in pixels
heightNumber (32 bit unsigned integer)Height in pixels
fpsNumber (32 bit unsigned integer)Frames per second
upload_timeNumber (64 bit integer)Upload timestamp (Unix milliseconds format)
next_asset_idNumber (64 bit unsigned integer)Identifier to use for the next asset, when created
original_readyBooleanTrue if the original asset exists and is ready
original_assetNumber (64 bit unsigned integer)Asset ID of the original asset. The original asset is Single-File
original_extStringExtension of the original asset file. Eg: mp4
original_encodedBooleanTrue if the original asset is encoded
original_taskNumber (64 bit unsigned integer)If the original asset is not encoded, the ID of the task assigned to encode it
thumb_readyBooleanTrue if the thumbnail asset exists and is ready
thumb_assetNumber (64 bit unsigned integer)Asset ID of the thumbnail asset. The thumbnail asset is Single-File
previews_readyBooleanTrue if the video previews asset exists and is ready
previews_assetNumber (64 bit unsigned integer)Asset ID of the video previews asset. The video previews asset is Multi-File
previews_intervalNumber (Floating point)Video previews interval in seconds
previews_taskNumber (64 bit unsigned integer)If the video previews asset is not ready, the ID of the task assigned to generate it
force_start_beginningBooleanTrue to indicate the player not to store the current playing time, so the video or audio starts from the beginning every time
is_animBooleanTrue to indicate the player to treat the media as an animation
img_notesBooleanTrue if the image has a notes asset
img_notes_assetNumber (64 bit unsigned integer)Asset ID of the image notes asset. The image notes asset is Single-File
resolutionsArray<Resolution>List of extra resolutions
subtitlesArray<Subtitle>List of subtitles files
time_splitsArray<TimeSplit>List of time splits for videos or audios
audio_tracksArray<AudioTrack>List of extra audio tracks for videos
attachmentsArray<Attachment>List of attachments stored with the media asset

The Resolution object has the following fields:

Field nameTypeDescription
widthNumber (32 bit unsigned integer)Width in pixels
heightNumber (32 bit unsigned integer)Height in pixels
fpsNumber (32 bit unsigned integer)Frames per second
readyBooleanTrue if the asset is ready
assetNumber (64 bit unsigned integer)Asset ID of the asset. The asset is Single-File
extStringAsset file extension. Example: mp4
task_idNumber (64 bit unsigned integer)If the asset is not ready, ID of the task assigned to encode it

The Subtitle object has the following fields:

Field nameTypeDescription
idStringSubtitles language identifier. Example: eng
nameStringSubtitles file name. Example English
assetNumber (64 bit unsigned integer)Asset ID of the asset. The asset is Single-File

The TimeSplit object has the following fields:

Field nameTypeDescription
timeNumber (Floating point)Time in seconds where the split starts
nameStringName of the time split

The AudioTrack object has the following fields:

Field nameTypeDescription
idStringAudio track language identifier. Example: eng
nameStringAudio track file name. Example English
assetNumber (64 bit unsigned integer)Asset ID of the asset. The asset is Single-File

The Attachment object has the following fields:

Field nameTypeDescription
idNumber (64 bit unsigned integer)Unique attachment identifier
nameStringAttachment file name
sizeNumber (64 bit unsigned integer)Attachment file size (in bytes)
assetNumber (64 bit unsigned integer)Asset ID of the asset. The asset is Single-File

The image notes asset is a JSON file, containing an array of ImageNote objects, with the following fields:

Field nameTypeDescription
xNumber (32 bit integer)X position (pixels)
yNumber (32 bit integer)Y position (pixels)
wNumber (32 bit integer)Width (pixels)
hNumber (32 bit integer)Height (pixels)
textStringText to display for the specified area

Tag indexes folder

When a tag is added to the vault, a new index file is created inside the tags folder, with a name made by concatenating the tag_ prefix with the tag identifier encoded in decimal, and the .index extension.

import (
    "fmt",
    "path",
)

func GetTagIndexPath(vault_path string, tag_id uint64) string {
	return path.Join(vault_path, "tags", "tag_"+fmt.Sprint(tag_id)+".index")
}

Each tag index file contains the list of media asset identifiers that have such tag.

Credentials file

The credentials file, named credentials.json is an unencrypted JSON file used to store the hashed credentials, along with the encrypted vault key.

The JSON file contains the following fields:

Field nameTypeDescription
userStringUsername of the root account
pwhashStringPassword hash. Base 64 encoded
saltStringHashing salt. Base 64 encoded
enckeyStringEncrypted key. Base 64 encoded
methodStringName of the hashing + encryption method used
fingerprintStringVault fingerprint
accountsArray<Account>Array of additional accounts

Each Account is an object with the following fields:

Field nameTypeDescription
userStringAccount username
pwhashStringPassword hash. Base 64 encoded
saltStringHashing salt. Base 64 encoded
enckeyStringEncrypted key. Base 64 encoded
methodStringName of the hashing + encryption method used
writeBooleanTrue if the account has permission to modify the vault

Currently, the following methods are implemented:

AES256 + SHA256 + SALT16

This algorithm uses a random salt of 16 bytes (128 bits).

The password hash is calculated by using the SHA256 algorithm 2 times on the binary concatenation of the password (as UTF-8) and the random salt:

import (
    "sha256"
)

func ComputePasswordHash(password string, salt []byte) []byte {
	firstHash := sha256.Sum256(append([]byte(password), salt...))
	secondHash := sha256.Sum256(firstHash[:])
	return secondHash[:]
}

The vault ket is encrypted using the AES256 algorithm, using the system defined in the Encrypted JSON files section. Specifically using the AES256_FLAT mode.

The key for the encryption is calculated by hashing with SHA256 the the binary concatenation of the password (as UTF-8) and the random salt:

import (
    "sha256"
)

func ComputeAESEncryptionKey(password string, salt []byte) []byte {
	passwordHash := sha256.Sum256(append([]byte(password), salt...))
	return passwordHash[:]
}

Media ID tracker

The media ID tracker file, named media_ids.json is an unencrypted JSON file used to store the number of used media identifiers, very important to prevent duplicated identifiers.

The JSON file has just one field:

Field nameTypeDescription
next_idNumber (64 bit unsigned integer)Next identifier to use when adding a new media asset

Tasks tracker

The task tracker file, named tasks.json is an unencrypted JSON file used to store the number of used task identifiers, in order to prevent duplicates. It also stores the pending tasks, in order to continue them in case of a vault restart.

The JSON file contains the following fields:

Field nameTypeDescription
next_idNumber (64 bit unsigned integer)Next identifier to use when creating a new task
pendingObject (Mapping String -> PendingTask)Mapping. For each pending task, the required metadata to restart them

The PendingTask objects have the following fields:

Field nameTypeDescription
idNumber (64 bit unsigned integer)Task identifier
media_idNumber (64 bit unsigned integer)Media asset ID
typeNumber (8 bit unsigned integer)Task type. It can be: 0 (Encode original), 1 (Encode extra resolution) or 2 (Generate video previews)
first_time_encBooleanTrue if this task is the first time the asset is being encoded (was just uploaded)
resolutionObject { width: Width (px), height: Height (px), fps: Frames per second }Resolution for type = 1

Albums file

The albums file, named albums.pmv is an encrypted JSON file used to store the list of existing albums in the vault.

The file has the following fields:

Field nameTypeDescription
next_idNumber (64 bit unsigned integer)Identifier to use for the next album, when creating a new one.
albumsObject { Mapping ID -> Album }List of albums. For each album it maps its identifier to its metadata

The Album object has the following fields:

Field nameTypeDescription
nameStringName of the album
lmNumber (64 bit integer)Last modified timestamp. Unix milliseconds format
listArray<Number (64 bit unsigned integer)>List of media asset identifiers contained in the album

Tags file

The tags file, named tag_list.pmv is an encrypted JSON file used to store the list of existing tags in the vault.

The file has the following fields:

Field nameTypeDescription
next_idNumber (64 bit unsigned integer)Identifier to use for the next tag, when creating a new one.
tagsObject { Mapping ID -> String }List of tags. For each tag, it maps its identifier to its name

User configuration file

The user configuration file, named user_config.pmv is an encrypted JSON file used to store the vault configuration set by the user.

The file has the following fields:

Field nameTypeDescription
titleStringVault custom title
cssStringCustom CSS for the frontend
max_tasksNumber (32 bit integer)Max number of tasks to run in parallel
encoding_threadsNumber (32 bit integer)Max number of threads to use for a single encoding task
video_previews_intervalNumber (32 bit integer)Video previews interval (seconds)
resolutionsArray<VideoResolution>Resolutions to automatically encode when uploading a video
image_resolutionsArray<ImageResolution>Resolutions to automatically encode when uploading an image

The VideoResolution object has the following fields:

Field nameTypeDescription
widthNumber (32 bit unsigned integer)Width in pixels
heightNumber (32 bit unsigned integer)Height in pixels
fpsNumber (32 bit unsigned integer)Frames per second

The ImageResolution object has the following fields:

Field nameTypeDescription
widthNumber (32 bit unsigned integer)Width in pixels
heightNumber (32 bit unsigned integer)Height in pixels

Main index file

The main index file, named main.index is an index file containing every single media asset identifier existing in the vault.

This file is used to check if a media asset exists and to perform searches when a tag filter is not specified.

Last modified October 19, 2024: Update pmv-cli manual (2ecfe71)