Introduction

Perst is a very simple object-oriented embedded database. It is easy to use and provides high performance. It is intended to be used in applications that need to deal with persistent data in a more sophisticated way than the load/store object tree provided by a standard serialization mechanism. Although Perst is very simple, it provides fault-tolerant support (ACID transactions) and concurrent access to the database.

The main advantage of Perst is its tight integration with programming languages. There is no gap between the database and the application data models - Perst directly stores language objects. So there is no need to pack or unpack code, an operation which has to be performed for traditional relational databases. Also Perst (unlike many other OODBMS) requires no special compiler or preprocessor. Yet, it is able to provide a high level of transparency.

Features

Let us now describe the key features of the Perst architecture.

Persistence by reachability

Perst implements the persistence by reachability approach. An object of any class derived from the Persistent base class is considered persistence-capable. It is automatically made persistent and stored in the storage when it is referenced from some other persistent object and when the store method of that object is invoked. So there is no need (but it is possible) to explicitly assign objects to the storage.

The storage has one special root object. The root object is the only persistent object accessed in a special way (using Storage.getRoot method). All other persistent objects are accessed in the normal way: either by traversing by reference from another persistent object, or by using object containers (Index, Link or Relation). Unlike many other OODBMS, there can be only one root in the storage. If you need to have several named roots, you should create an Index object and use it as the root object.

Before version 4.0, Perst required that each persistence-capable class be derived from the Persistent class. This made it impossible to store ‘foreign’ classes in the storage, and was the ‘price to pay’ for Perst’s ease of use and the absence of any specialized preprocessors or compilers.

Starting from version 4.0, Perst allows ANY class, even those not implemented using the IPersistent interface, to be stored in the database. It is still possible to derive persistent classes from Persistent, and such classes will even be handled more efficiently than "foreign" classes not derived from Persistent. New rules surrounding persistence-capable objects are as follows:

Components of persistence-capable objects are restricted to the following types:

Scalar types
Any valid Java scalar type: boolean, integer or real. For example, boolean, int, short, double, ...
String type
java.lang.String type
Date type
java.util.Date type
Reference to the persistent object
Any class inherited from Persistent or any interface extending the IPersistent interface.
Value type
Any class implementing the IValue interface with the same restrictions on the types of components as for persistence-capable objects. If perst.implicit.values property is true, then any class which is not derived from IPersistent will be treated as a value. Values are stored inside the persistent object containing them. The value class should have a default constructor (a constructor with an empty parameter list) or no constructors at all. The declared type of the field referencing value should be the same as the actual type of referenced object (so it is not possible to assign an object of the derived class to this field). Also, the value class should not have recursive references (direct or indirect) – e.g. field of value class V cannot have type V.
Array type
A One-dimensional array with type of components as described above.
Link type
One-to-many link or, from an implementation point of view, a dynamic array of persistent objects. Links are created using the Storage.createLink method.
Standard Java collections classes
Collection and map classes from java.lang.util package are stored as embedded collections in the persistent object, referencing them. So such collections and maps are not independent persistent objects themselves, them are not assigned OID and can not be referenced from more than one persistent object. Collection and maps classes defined by applications are treated as any other persistent objects unless you set "perst.serialize.system.collections" property to false.

Unfortunately, it is not possible to detect if an object is changed or not without saving the old state of the object and performing a field-by-field comparison with the new state of the object. But, the overhead of such a solution (both in terms of space and CPU resources) is very high. In Perst it is the responsibility of the programmer to save objects in the storage. This can be done by the Persistent.store or Persistent.modify methods.

The Persistent.store method writes an object, as well as all objects which are not yet persistent but referenced from this object, to the storage. So, if you create a tree of objects and assign a reference to the root of this tree to some persistent object X, it is only necessary to invoke the store() method in this object X. But, if you update one of the elements in this tree, you should invoke store() individually for each such element (X.store() will NOT automatically save referenced objects now).

The Persistent.modify method marks an object as modified but doesn't immediately write it to the storage. All objects marked as modified will be stored in the storage during transaction commit (store method will be invoked for each modified object). So, using the modify method is preferable if an object is updated multiple times within a transaction. In this case, instead of storing it several times, it will be stored only once - at the end.

Perst doesn't support nested non-static classes. The reason is that the non-static class contains a final field pointing to the outer class. As far as a field is final, it is not possible to assign a value to it during object loading in the same way as for all other fields of a persistent object.

Semitransparent object loading

Perst does not use any special compiler or preprocessor. Since Java doesn't provide runtime behavior reflection (changing behavior of an object at runtime), it is not possible to implement completely transparent persistence (where there are no differences between accesses to transient and persistent objects). Instead of that, Perst proposes transparent behavior in most cases with some exceptions.

The IPersistent interface declares the recursiveLoading method. The default implementation of this method in the Persistent class always returns true. In this case, Perst will recursively load any object referenced by a target object when the target object is loaded. Thus, it causes implicit loading of the cluster of all referenced objects from storage to the main memory. This is similar to how a serialization mechanism works.

To avoid an overflow of the main memory caused by recursive loading of all objects from the storage, a programmer has to overload recursiveLoading method in some classes and return false in it. Objects referenced by such an object will not be implicitly loaded and the programmer has to explicitly invoke Persistent.load method to load them. So, the recursiveLoading method can be used to control loading of objects from storage to main memory.

Also, it is important to notice that containers (Link, Relation, Index) always load member objects on demand (they do not perform automatic loading of all objects in the containers). Since access to the container members is always performed through methods of the container class, the programmer will never notice it - container methods will always return loaded object(s).

Perst uses a default constructor (constructor without parameters) to create an object loaded from storage. This means that:

  1. All persistence-capable classes should have a default constructor (or have no constructor at all, in which case it will be automatically generated by the compiler). The default constructor should not necessarily be public; it can have any access type. Perst is also able to generate default constructors for persistent classes using sun.reflect.ReflectionFactory. But this will work only with Sun's JDK.
  2. The default constructor should initialize only the transient fields.
  3. The default constructor is used to create an instance of the persistent object loaded from the storage. So, at the time of execution of the default constructor, the fields of the constructed object are not yet assigned the values stored in the database. If you need these values to be able to perform initialization of the transient fields, then you need to perform this initialization by the onLoad method which is invoked after fetching all values of the non-transient fields of the persistent object from the storage.

So, summarizing all of the above, the proposed mechanism is convenient and easy to use because it doesn't require the programmer to explicitly load any referenced object. From another point of view, it is flexible by providing the programmer control on object loading. Only those classes (usually containers) which explicitly control loading of their members (by overloading recursiveLoading to return a false value) should be aware about calling the Persistent.load method.

Automatic schema evolution

Perst supports lazy automatic schema evolution. When a class format is changed, Perst performs a conversion of the loaded object to the new format. If this object is modified, it will be stored in the new format. So, the object is converted to the new format on demand. Perst is able to handle the following schema changes:
  1. Compatible change of scalar type (change which cannot cause data truncation). For example, it is possible to change the int type to long or to float. But changing the type from int to short is not possible. More precisely, Perst is able to perform any conversion which is supported by Java reflection mechanism (field type XXX can be changed to YYY if the java.lang.reflect.Field.setXXX method can be applied to the component with type YYY).
  2. Reordering of components within a class or moving the components to a base or derived class.
  3. Adding/removing of classes from the class inheritance hierarchy.
  4. Changing the format of classes with value semantics.

All other schema modifications (such as renaming fields, incompatible change of field type) cannot be handled by the Perst automatic schema modification mechanism. In this case, you can use the Perst XML export/import mechanism. A database can be exported to XML format using the Storage.exportXML method and then recreated with new class definitions. After it is saved, data can be imported using the Storage.importXML method.

Relations

Java references provide a way to implement one-to-one relation between objects. But in many cases, one-to-many and many-to-many relations are needed. Perst provides the Link interface to deal with relations of this kind. This interface allows you to add/delete/inspect members of the relation. Members of the relation can be accessed by index or be extracted as an array of objects.

Relations can be of two kinds: embedded (where references to the related objects are stored in relation to the owner object itself) and standalone (where the relation is a separate object, which contains the references to the relation owner and the relation members). Both kinds of relations implement the Link interface. An embedded relation is created by the Storage.createLink method and a standalone relation is represented by the persistent class created by the Storage.createRelation method.

So, a relation between two classes A and B can be implemented in the following way:

Relation typeObject AObject B
one-to-oneB bref;A aref;
one-to-manyLink bref;A aref;
many-to-oneB bref;Link aref;
many-to-manyLink bref;Link aref;

Indices

Usually objects are accessed by traversing from one object to another using references or links. But it is frequently needed to locate an object by its key. In JDK, the Hashtable or the HashMap class can be used for this purpose. In databases, usually a more sophisticated search is required. I do not want to implement the complete SQL language in Perst, because it immediately makes the DBMS huge and slow. But, in most cases, an application performs only very simple queries using an exact match or a key range. This is done in Perst by Index and IndexField interfaces. The first interface is used for an independent specification key and its associated value. IndexField interface allows you to index objects using one of the fields of this object (key field).

Indices are created in Perst using the Storage.createIndex or the Storage.createFieldIndex methods. There can be several index implementations but, right now, only one implementation based on B+Tree is provided (because B+Tree is the most efficient structure for disk-based databases). Methods of the Index and the FieldIndex interfaces allow you to add, remove and locate objects by the key. It is possible to perform a search either by specifying the exact key value or by specifying a range of key values (the high or low boundary or both of them can be skipped or can be declared as being exclusive or inclusive). So it is possible to perform the following types of search:

  1. key equals VAL
  2. key belongs to [MIN_VAL, MAX_VAL]
  3. key belongs to [MIN_VAL, MAX_VAL)
  4. key belongs to (MIN_VAL, MAX_VAL]
  5. key belongs to (MIN_VAL, MAX_VAL)
  6. key is greater than MIN_VAL
  7. key is greater than or equal to MIN_VAL
  8. key is less than MAX_VAL
  9. key is less than or equal to MAX_VAL

There are several different ways of selecting objects using index:

IPersistent get(Key key)
Get an object by its key. This method should be used for unique indices, to locate an object by the exact key value.
Object[] get(Key from, Key till)
Get an array of objects with keys belonging to the specified range. Either the ‘from’ boundary, the ‘till’ boundary or both can be null. Both boundaries can be inclusive or exclusive.
Iterator iterator()
Get an iterator which will traverse all objects in the index in the ascending order of their keys.
Iterator iterator(Key from, Key till, int order)
Get an iterator for objects with keys belonging to the specified range. Either the ‘from’ boundary, the ‘till’ boundary or both can be null. Both boundaries can be inclusive or exclusive. Objects can be traversed in the ascending or descending order of their keys.

If you need a set of persistent objects you should use the Storage.createSet method. Set is implemented using B+Tree where the object id (OID) is used as a key.

Perst also supports spatial indices (org.garret.perst.SpatialIndex) and generic indices with a user-defined comparator (org.garret.perst.SortedCollection). The spatial index is implemented using Guttman's R-Tree with quadratic split algorithm. It allows for efficient search of R2 objects. Sorted Collection provides almost the same methods as FieldIndex but it uses a user-defined comparator to compare collection members. A sorted collection is implemented using a T-Tree and is especially efficient for main-memory databases.

The table below summarizes information about all indices supported by Perst:
InterfaceDescriptionKey typeImplementationCreated by
Index Index with an explicitly specified key used for exact match or range queries scalar, string or reference B+Tree Storage.createIndex(Class type, boolean unique)
Index The same as above but assuming that there can be a lot of duplicate key values scalar, string or reference B+Tree Storage.createThinkIndex(Class type)
Index Random access index optimized for accessing elements both by key and by position scalar, string or reference B+Tree Storage.createRandomAccessIndex(Class type, boolean unique)
FieldIndex Index constructed for one of the object fields scalar, string or reference B+Tree Storage.createFieldIndex(Class type, String fieldName, boolean unique)
FieldIndex Random access field index optimized for accessing elements both by key and by position scalar, string or reference B+Tree Storage.createRandomAccessFieldIndex(Class type, String fieldName, boolean unique)
MultidimensionalIndex Multidimensional index allows to select objects using search conditions for various fields. This index provides better performance than traditional indices if there is no single field in the query with good selectivity. In case of traditional indices (like B-Tree) it is necessary to join large results sets or perform sequential search among large number of objects. Compound index allows to select object only if all keys are specified (or at least prefix keys). And multidimensional index allows to perform search simultaneously taking in account all specified restrictions. But as far as it is implemented using binary tree which nodes contains references to the indexed objects, this index is efficient mostly for in-memory databases or in cases when all traversed objects fits in memory. scalar, string or any other comparable type KD-Tree Storage.createMultidimensionalIndex(Class cls, String[] fieldNames)
or
Storage.createMultidimensionalIndex(MultidimensionalComparator comparator)
BitIndex Bit index for searching object by bitmap of properties persistent object B+Tree Storage.createBitIndex()
IPersistentSet Set of persistent objects persistent object B+Tree Storage.createSet()
IPersistentSet Scalable set of persistent objects (can efficiently handle both small and large number of members) persistent object Link or B+Tree Storage.createScalableSet()
IPersistentList List of persistent objects providing random access persistent object B+Tree Storage.createList()
IPersistentList Scalable list of persistent objects (can efficiently handle both small and large number of members) persistent object Link or B+Tree Storage.createScalableList()
IPersistentMap Scalable map of persistent objects (can efficiently handle both small and large number of members) persistent object Sorted array or B+Tree Storage.createMap()
SpatialIndex Index for spatial objects with integer coordinates Rectangle R-Tree Storage.createSpatialIndex()
SpatialIndexR2 Index for spatial objects with real coordinates RectangleR2 R-Tree Storage.createSpatialIndexR2()
SortedCollection Index with a user-defined comparator any T-Tree Storage.createSortedCollection(PersistentComparator comparator, boolean unique)
FullTextIndex Full text index any text B-Tree based inverse index Storage.createFieldIndex(FullTextSearchHelper helper)

Projection

Using Perst indices, a programmer can easily implement most simple SQL queries, like:
    select * from T where x=?;
Perst relations can be used to implement simple joins, like the following SQL query fetching all orders to a particular vendor:
    select * from O Order, V Vendor where O.vendorID = V.id AND V.name='ABC';
In Perst it is possible to first select a vendor using an index search and then traverse the corresponding relations to locate all the orders to this vendor.

But sometimes, it is necessary to implement more complex queries. This is also possible in Perst, without the need to write queries in some special (non-procedural) language. The Projection class is used to combine the results of several simple operations (such as an index search). Let us start explanation of this class with an example. Consider that we need to know the details of all the orders with names starting with 'D1' shipped by vendors having names starting with 'V1' or 'V3'. The SQL statement which performs this query is as follows:

    select * from O Order, V Vendor, D Detail where D.name like 'D1%' and O.detailID = D.id
        and O.vendorID = V.id and (V.name like 'V1%' OR V.name like 'V3%');
And now, we shall see how this can be done in Perst. Consider that we have indices for details and vendors:
    FieldIndex detailIndex = db.createFieldIndex(Detail.class, "name", true);
    FieldIndex vendorIndex = db.createFieldIndex(Vendor.class, "name", true);
Set of requested orders can be obtained in the following way:
    //  Projection from vendors to orders (using "orders" field of Link type)
    Projection v2o = new Projection(Vendor.class, "orders");
    //  Projection from details to orders (using "orders" field of Link type)
    Projection d2o = new Projection(Detail.class, "orders");
    // Select vendors with name like 'V1%'
    v2o.project(vendorIndex.getPrefix("V1"));
    // Select vendors with name like 'V3%'
    v2o.project(vendorIndex.getPrefix("V3"));
    // Select details with name like 'D1%'
    d2o.project(detailIndex.getPrefix("D1"));
    // Join projections
    v2o.join(d2o);
    // Get array of requested orders
    Order[] orders = (Order[])v2o.toArray(new Order[v2o.size()]);
As you can see, the Projection class is used for several purposes:
  1. To combine the results of several simple operations (using the OR operator).
  2. To eliminate duplicates resulting from such a merge.
  3. To get a set of related objects (perform projection using specified projection field).
  4. To join several projections (analogue of SQL JOIN).

It is possible to derive your own class from the Projection class to implement more sophisticated projections than using a single projection field.

If you need to perform a selection sort in some particular order then the most efficient way is to use the FieldIndex method for it. When you select objects using index, the selected objects are sorted by the search key. If you need to perform a selection sort by a field which is not the search key, then you can use the Arrays.sort method. For example, if in the query described above we need to sort orders by the price field, it can be done with the following statement:

    Array.sort(orders, new Comparator() { 
       public int compare(Object a, Object b) { return ((Order)a).price - ((Order)b).price; }
    });

Transaction model

Perst preserves consistency of data in case of a system or an application failure. The transaction mechanism is used to implement an all-or-nothing database update. Transactions in Perst are started implicitly when an update operation is performed the first time and finished explicitly by the commit, rollback or close methods.

The committing of a transaction causes a synchronous write of changed pages to the disk. Synchronous writes (where an application is blocked until data is really flushed to the disk) are very expensive operations. The average positioning time for modern disks is about 10 msec, so they are usually not able to perform more than 100 writes in random places in one second. As a transaction usually consists of several updated pages, this leads to an average performance of about 10 transaction commits per second.

Performance can be greatly increased if you minimize the number of commits (larger transactions). Perst uses a shadow mechanism for transaction implementation. When an object is changed for the first time during a transaction, a shadow of the object is created and the original object is kept unchanged. If the object is updated multiple times during the transaction, the shadow is created only once. Because it uses shadows, Perst does not need a transaction log file. Therefore, in Perst, long transactions will not cause a transaction log overflow as in most other DBMSes. Quite the contrary, if you do not call commit at all. Perst works as a DBMS without transaction support, adding almost no overhead for supporting transactions.

The only drawback of long transactions is the possibility of losing a lot of changes in case of a fault. Perst will preserve consistency of the database, but all changes made since the last commit will be lost.

Perst also supports per-thread transactions. Three types of per-thread transactions are supported: exclusive, cooperative and serializable. In an exclusive transaction, only one thread can update the database. In cooperative mode, multiple transactions can work concurrently and the commit() method will be invoked only when the transactions of all the threads are terminated. Serializable transactions can also work concurrently. But unlike cooperative transactions, the threads are isolated from each other. Each thread has its own associated set of modified objects and committing the transaction will cause saving of only these objects to the database. To synchronize access to the objects in case of a serializable transaction, a programmer should use lock methods of the IResource interface. A shared lock should be set before read access to any object, and an exclusive lock  before write access. Locks will be automatically released when the transaction is committed (so a programmer should not explicitly invoke the unlock() method). In this case it is guaranteed that the transactions are serializable.

It is not possible to use the IPersistent.store() method in serializable transactions (you should use IPersistent.modify() instead). That is why it is also not possible to use the Index and the FieldIndex containers since they are based on the B-Tree and a B-Tree directly accesses database pages and uses the store() method to assign an OID to an inserted object. You should use the SortedCollection based on T-Tree instead or an alternative B-Tree implementation (see "perst.alternative.btree" property).

Starting from version 2.66, Perst provides multi-client access to a database. This is implemented using OS file locks. Databases are accessed in multiple-readers-single-writer mode. In order to enable multi-client support, it is necessary to set the "perst.multiclient.support" property. Then, all accesses to a database should be performed within explicit transactions, i.e. transactions should be started using the beginThreadTransaction method. If a transaction is read-only, then the Storage.READ_ONLY_TRANSACTION mode should be used. Several read-only transactions can be executed in parallel. But if a transaction may modify the contents of the database, then the Storage.READ_WRITE_TRANSACTION mode should be used and the transaction will have exclusive access to the database.

Transaction access to the database is synchronized for the different processes in one system as well as for the different threads within one process. Transactions can be nested - you can invoke beginThreadTransaction as many times as you need, assuming that the same number of endThreadTransaction calls are invoked. So, you can wrap the code of each method accessing the database with the beginThreadTransaction and the endThreadTransaction methods. It is not possible to upgrade a transaction from shared (READ_ONLY_TRANSACTION) to exclusive (READ_WRITE_TRANSACTION). Hence, you should always use the READ_WRITE_TRANSACTION if the transaction may modify the database.

The outermost endThreadTransaction will cause the transaction to commit and release the OS file lock. Unlike endThreadTransaction, the rollbackThreadTransaction doesn't consider the counter of nested transactions - it aborts all nested transactions. If some process accessing the database crashes, the file lock held by this process is automatically released. According to the specification of the java.nio.channels.FileChannel.lock method, some operating systems may not support shared locks. In this case the only choice is to always use the READ_WRITE_TRANSACTION mode. Because the java.nio package was introduced in JDK 1.4, Perst is not able to provide multi-client access with earlier JDK releases.

Replication

Perst supports the master-slave replication mechanism. It allows us to have a single master node and several slave nodes. All updates of the database can happen only at the master node and are replicated at the slave nodes. The slave nodes are able to execute read-only requests working with the shadow snapshot of the database (state after the last commit).

Replication is page based - all dirty pages from the master node page pool are sent to the slave nodes. A slave node detects the moment of transaction commit (when the root page is sent to a slave node and the state index is changed in the database header) and sets an exclusive lock in this case. The read-only transactions at the slave nodes set a shared lock and work with the previous (consistent) state of the database. When a transaction commit is performed by the master node, a slave node uses this lock to transfer the database into the next consistent state. Starting from this moment, the read-only transactions at a slave node will work with the new, consistent state of the database.

Perst supports two replication models: static and dynamic. In the static model, all slave nodes have to be defined when an application is started. The master node tries to connect to all the specified slave nodes. After establishing all possible connections with the active slave nodes, it starts working, replicating the changes to these slave nodes. The content of all statically defined slave nodes is considered to be the same as that of the master node. Thus, when a database is opened, the master expects that all the connected slave nodes have the same state as its own state and will send to them only the pages notified by the application after it is opened. If you are going to add a new static slave node, you should first (before the database is opened) manually copy the database from the master node to this new slave node.

In the dynamic model, it is possible to add new replicated nodes at each moment in time. In this case, Perst transfers the complete database to the new node, synchronizing its state with the master node state. When the new replicated slave node is synchronized with the master node, it will act in the same way as a static, replicated node: the master will send to both of them all the updates performed by the application. The dynamic mode is especially useful for main-memory databases (with an infinite page pool and a NullFile stub). In this case, the size of the database is not expected to be very large and its transfer to a new replicated node should not take a lot of time.

As far as replicated nodes are able to execute read-only queries, the replication mechanism can also be used to improve performance by distributing the load between several nodes. To be able to execute read-only transactions at a slave node, an application should use per-thread transactions wrapped by the Storage.beginThreadTransaction(Storage.REPLICATION_SLAVE_TRANSACTION) and the Storage.endThreadTransaction() method invocations. It is possible to run many such transactions concurrently. Only when the master node commits a transaction, a slave node has to set an exclusive lock blocking all read-only transactions (certainly it has to first wait for the completion of all currently executing read-only transactions. So, these transactions should be short enough to avoid blocking the replication mechanism for a long time). This exclusive lock is released very fast - immediately after placing a new root page in the pool. Now read-only transactions will see the new state of the database.

When an application running at the master node closes the storage, the master sends a CLOSE request to all the slave nodes causing them to close their connection with the master. In case of abnormal termination of one of the slave nodes, the master continues to work with the other slave nodes. If no more slave nodes are online, the master continues to work autonomously. The Perst replication mechanism doesn't enforce any particular strategy for the recovery of nodes. It is up to the application how to handle the situation of the master crashing and choosing a new master node.

JSQL

Overview

JSQL is a subset of the SQL language, which can be used to select object instances according to the selection condition. JSQL uses a notation that is more popular in object-oriented programming than for a relational database. Table rows are considered as object instances and the table - as a class of these objects. Unlike SQL, JSQL is oriented to work with objects instead of SQL tuples. So the result of each query execution is a set of objects of one class. The main differences of JSQL from standard SQL are:

  1. There are no joins of several tables and nested sub queries. A Query always returns a set of objects from one table.
  2. The standard Java types are used for atomic table columns.
  3. There are no null values, only null references.
  4. Arrays can be used as record components. A special exists quantor is provided for locating elements in arrays.
  5. User methods can be defined for table records (objects) as well as for record components.
  6. References between objects are supported, including user-defined reference resolvers.
  7. As long as the query language is deeply integrated with the Java language, the case sensitive mode is used for the language identifiers as well as for the keywords.
  8. No implicit conversion of the integer and floating types is done to string representation. If such a conversion is needed, it should be done explicitly.

In Perst, there is the org.garret.perst.Query class which allows you to execute JSQL queries. To execute a JSQL query it is necessary to provide the object iterator class (java.util.Iterator) of iterated objects and the string representing the query condition. It is also possible to specify indices, resolvers, query parameters. The result of a query execution is another iterator which can be used to traverse through the selected objects (objects matching the specified search criteria).

All the Perst collection classes contain the select method which allows you to filter collection members using a JSQL query. This method is given the class of objects stored in the collection (for some collections such as FieldIndex this is known and should not be specified) and the JSQL condition. As a result of its work, select returns the iterator of the selected objects.

JSQL formal grammar

The following rules in BNF-like notation specifiy the grammar of the JSQL query language search predicate:

Grammar conventions
ExampleMeaning
expressionnon-terminals
notterminals
|disjoint alternatives
(not)optional part
{1..9}repeat zero or more times

select-condition ::= ( expression ) ( traverse ) ( order )
expression ::= disjunction
disjunction ::= conjunction 
        | conjunction or disjunction
conjunction ::= comparison 
        | comparison and conjunction
comparison ::= operand = operand 
        | operand != operand 
        | operand <> operand 
        | operand < operand 
        | operand <= operand 
        | operand > operand 
        | operand >= operand 
        | operand (not) like operand 
        | operand (not) like operand escape string
        | operand (not) in operand
        | operand (not) in expressions-list
        | operand (not) between operand and operand
	| operand is (not) null
operand ::= addition
additions ::= multiplication 
        | addition +  multiplication
        | addition || multiplication
        | addition -  multiplication
multiplication ::= power 
        | multiplication * power
        | multiplication / power
power ::= term
        | term ^ power
term ::= identifier | number | string 
        | true | false | null 
	| current 
	| ( expression ) 
        | not comparison
	| - term
	| term [ expression ] 
	| identifier . term 
	| function term
        | count (*) 
        | contains array-field (with expression) (group by identifier having expression)
        | exists identifier : term
function ::= abs | length | lower | upper
        | integer | real | string | 
        | sin | cos | tan | asin | acos | 
        | atan | log | exp | ceil | floor 
        | sum | avg | min | max 
string ::= ' { { any-character-except-quote } ('') } '
expressions-list ::= ( expression { , expression } )
order ::= order by sort-list
sort-list ::= field-order { , field-order }
field-order ::= field (asc | desc)
field ::= identifier { . identifier }
fields-list ::=  field { , field }
user-function ::= identifier

Identifiers are case sensitive, begin with a..z, A..Z, '_' or '$' character, contain only a-z, A..Z, 0..9 '_' or '$' characters, and do not duplicate any SQL reserved words.

List of reserved words
absacosandascasin
atanavgbetweenbycontains
cosceilcountcurrentdesc
escapeexistsexpfalsefloor
grouphavinginintegeris
lengthlikeloglowermax
minnotnullorreal
sinstringsumtantrue
upperwith

JSQL extends ANSI standard SQL operations by supporting bit manipulation operations. Operators and/or can be applied not only to boolean operands but also to operands of integer type. The result of applying and/or operator to integer operands is an integer value with its bits set by bit-AND/bit-OR operation. Bit operations can be used for efficient implementation of small sets. Also raising to a power operation ^ is supported by JSQL for integer and floating point types.

Arrays

JSQL is able to work with Java array types. JSQL provides a set of special constructions for dealing with arrays:

  1. It is possible to get the number of elements in the array by using the length() function.
  2. Array elements can be fetched by the [] operator. If the index expression is out of the array range, then an exception will be raised.
  3. The in operator can be used for checking if the array contains the value specified by the left operand. This operation can only be used for arrays of atomic types: with boolean, numeric, reference or string components.
  4. Iteration through the array elements is performed by the exists operator. The variable specified after the exists keyword can be used as an index in arrays in the expression preceded by the exists quantor. This index variable will iterate through all possible array index values, until the value of the expression becomes true or the index runs out of range. Condition
            exists i: (contract[i].company.location = 'US')
    
    will select all details which are shipped by companies located in US, while the query:
            not exists i: (contract[i].company.location = 'US')
    
    will select all details which are shipped only by companies outside US.

    Nested exists clauses are allowed. Use of nested exists quantors is equivalent to nested loops using correspondening index variables. For example, the query

            exists column: (exists row: (matrix[column][row] = 0))
    
    will select all records, containing 0 in the elements of the matrix field, which have an array of integer type. This construction is equivalent to the following two nested loops:
           boolean result = false;
           for (int column = 0; column < matrix.length(); column++) { 
                for (int row = 0; row < matrix[column].length(); row++) { 
                     if (matrix[column][row] == 0) { 
                         result = true;
                         break;
                     }
                }
           }
    
    The order of using indices is significant! The result of the following query execution
            exists row: (exists column: (matrix[column][row] = 0))
    
    will be completely different from the result of the previous query. The program may simply hang in the last case due to occurrence of an infinite loop for empty matrices.
  5. It is possible to use the following clause
    contains array-field (with expression) (group by array-component-field having expression-with-aggregate-functions)
    to select the arrays with elements matching the specified with condition, and group the elements in the array to apply aggregate functions.
    If the with expression is specified in the contains clause, this construction is equivalent to the exists statement with the same condition, but, with two exceptions:

Strings

All strings in JSQL have a varying length and a programmer should not worry about the specification of maximal length for character fields. All operations that are acceptable for arrays are also applicable to strings. In addition to them, strings have a set of their own operations. First of all, strings can be compared with each other using the standard relation operators.

The like construction can be used for matching strings with a pattern containing special wildcard characters '%' and '_'. The '_' character matches any single character, while the '%' character matches any number of characters (including zero characters). The extended form of the like operator with the escape part can be used to handle the '%' and '_' characters in the pattern as normal characters if they are preceded by a special escape character, specified after the escape keyword.

It is possible to search for a substring within a string by using the in operator. The expression ('blue' in color) will be true for all records which contain 'blue' in their color field. Strings can be concatenated by the + or || operators. The latter one was added only for compatibility with the ANSI SQL standard. Because JSQL doesn't support implicit conversion to string type in expressions, the semantic of operator + can be redefined for strings.

References

References can be dereferenced using the same dot notation as is used for accessing structure components. For example, the following query:
        company.address.city = 'Chicago'
will access records referenced by the company component of the Contract record and extract the city component of the address field of the referenced record from the Supplier table.

References can be checked for null by is null or is not null predicates. Also, references can be compared for equality with each other as well as with the special null keyword. When a null reference is dereferenced, an exception will be raised by JSQL.

There is a special keyword current, which can be used to get a reference to the current record during a table search. Usually, the current keyword is used for the comparison of the current record identifier with other references or for locating the current record identifier within an array of references. For example, the following query will search the Contract table for all active contracts (assuming that the field canceledContracts has Contract[] type):

        current not in supplier.canceledContracts

A programmer can define methods for structures, which can be used in queries with the same syntax as normal structure components. Such methods should have no arguments, except for a pointer to the object to which they belong (the this pointer in Java), and should return an atomic value (of boolean, numeric, string or reference type). Also, the method should not change the object instance (immutable method). So, user-defined methods can be used for creating virtual components - components which are not stored in a database, but instead are calculated using the values of other components. For example, you can use the standard Java java.util.Date class with such methods as getYear(), getMonth() . . . . Thus, it is possible to specify queries like: "delivery.getYear = 99" in an application where the delivery record field has the java.util.Date type.

Functions

Predefined functions
NameArgument typeReturn typeDescription
absintegerintegerabsolute value of the argument
absrealrealabsolute value of the argument
sinrealrealsin (rad)
cosrealrealcos (rad)
tanrealrealtan (rad)
asinrealrealarcsin
acosrealrealarccos
atanrealrealarctan
exprealrealexponent
logrealrealnatural logarithm
ceilrealrealthe smallest integer value that is not less than the argument
floorrealrealthe largest integer value that is not greater than the argument
integerrealintegerconversion of real to integer
lengtharrayintegernumber of elements in array
lowerstringstringlowercase string
realintegerrealconversion of integer to real
stringintegerstringconversion of integer to string
stringrealstringconversion of real to string
upperstringstringuppercase string

Optimization

JSQL provides a mechanism for the fast location of objects by a unique primary key. When a query condition is checked for the equality of a scalar (numeric or string) field and a constant value or, if the query condition is a conjunction (logical AND) of the operands and the left operand is checked for the equality of a scalar field and a constant value, then, JSQL tries to apply the index to locate an object by this key value. If an object is found but the query condition is a conjunction, then, JSQL also checks that the value of the right operand is true.

To be able to use indices, a programmer has to obtain a query object and register the indices in it. JSQL supports the Perst Index and FieldIndex indices. The indices are located by JSQL by their field name. It is assumed that class objects returned by the index iterator are the same as specified in the select query. If you are going to use the same query instance for selecting data from different collections (with different collection member classes), please check that there are no name conflicts for the key fields, i.e. index corresponding to one collection will not be applied for another collection.

Perst has very simple JSQL queries optimizer. Perst is able to use indices for query execution if all the conditions below are true:

  1. The query predicate consists of one or more conjuncts (expressions combined with logical AND operator).
  2. One of the conjuncts is a simple comparison operator (equal, less than, greater than, less than or equal, greater than or equal), pattern matching (LIKE operator), range check (BETWEEN operator), set IN operator or sequence of comparisons for equality combined with logical OR operator where left operand of all comparisons refer to the same field.
  3. One of the operands of this comparison operation is an indexed field (a field for which an index exists). If this is not a field of the queried class, then indexes should exist for each reference in the field access path: for example, if the condition is (company.location.city == "London"), then indexes are required not only for the "city" field, but also for the "location" and "company" fields.
  4. Other operands of comparison operation are either constants, either query parameters

Maintaining indices is the responsibility of a programmer. JSQL is only using the indices for searching elements by their keys.

Database

For those of you who prefer to deal with records in tables instead of objects, Perst provides the Database class. This class emulates a relational database on top of Perst. Certainly, even with this class, Perst provides a very object-oriented way of dealing with persistent data. The Database class allows you to create/drop tables, add/drop indices, create/update/delete records. It automatically updates indices when a new record is added or removed. Also, the Database class makes it possible to prepare and execute queries.

The Database class is used as a wrapper for the underlying Perst storage. The storage passed to the constructor of the Database class should be opened and either not initialized, or initialized before by the same Database constructor. It is not possible to attach a database to the Perst Storage with an existing root object other than one set by the Database constructor.

A table in a database is associated with the Java class. Almost all methods of the database class require passing of the Class parameter or persistent object instance. In the latter case, class of this object is used. If Perst finds no table associated with the class, it tries to find a table for the superclass and so on... Hence, it is possible to have a "polymorphic" table - a table containing records (objects) of multiple classes derived from this one. Methods dealing with an object instance have an overloaded version where the class can be specified explicitly. This makes it possible to place an object in a table associated with some particular interface.

A database provides two ways of querying tables: using simple and prepared queries. In the first case, Database.select directly returns the result of query execution (iterator). In the second case, it is possible to execute a prepared query multiple times, specifying different values of parameters.

It is responsibility of programmer in Perst to maintain consistency of indices: before updating key field it is necessary to exclude the object from the index and after assigning new value to the key field - reinsert it in the index. Database class provides excludeFromAllIndices and includeInAllIndices methods, which can be used to include/exclude object in all related indices. If you know exactly which field will be updated, it is more efficient to use includeInIndex and excludeFromIndex methods specifying name of the updated field. But it is more convenient to use Database.updateKey method which do all this steps itself: exclude from index, update, mark object as been modified and reinsert in index. It is safe to call updateKey method for fields which are actually not used in any index - in this case excludeFromIndex/includeInIndex does nothing.

Please find more information on the Database class methods in Javadoc and look at the JSQL-SSD example - Supplier-Shipment-Detail database built using the Database class.

Code generator

As mentioned above, Perst provides the JSQL query language with SQL-like syntax. SQL grammar is close to natural language and therefore is convenient for writing and understanding queries. But in many cases, a query is not written by a human, but instead is constructed from data specified by the user in a GUI form. In this case, using a text form of query, as with SQL and JSQL, is inconvenient and inefficient. First, the programmer must take care to follow a demanding syntax, convert all parameter values to string format, etc. Then the database must parse the query to transform it into an abstract syntax tree (AST), which will be used either directly to execute query, or to generate more low level code (for example, Java Virtual Machine byte code, or native instructions in C/C++). This parsing step imposes performance overhead.

Starting in version 4.04, Perst provides another way to construct queries ? a code generator. This is a special interface class in which methods correspond to JSQL operations. Consider the following example, in SQL:

     select * from Person where salary > 10000 and age <= 40 order by name ;
In Perst, such a query can be written in this way:
     for (Person p : db.<Person>select(Person.class, "salary > 10000 and age <= 40 order by name"))
     {
         ...
     }
Using the code generator, the query can be built as follows:
     Query<Person> query = db.<Person>createQuery(Person.class);
     CodeGenerator code = query.getCodeGenerator();
     code.predicate(code.and(code.gt(code.field("salary"), code.literal(10000)),
                             code.le(code.field("age"), code.literal(40))));
     code.orderBy("name");
     for (Person p : query) 
     {
         ...
     }
Most of the methods of the CodeGenerator class return a result of type CodeGenerator.Code interface. This interface provides no methods, since the structure of nodes that it generates is opaque to the programmer. These nodes are used as operands for other methods, in order to construct an abstract syntax tree (AST) representing the query.

Method CodeGenerator.predicate(Code condition) specifies the query predicate. If this method is not invoked, then the query will select all instances of the specified class.

Method CodeGenerator.orderBy allows the user to specify the required sort order. It is possible to call this methods several times, in which case sorting will be performed by all specified fields in the specified order, as illustrated in another example. Here the operation is performed in SQL:

 
     SELECT * FROM Car WHERE color IN ('black, 'silver') 
         AND mileage < 100000 AND price < 20000 AND (airCondition OR climatControl)
         ORDER By price desc, model asc;
Below is a code fragment constructing this query using CodeGenerator:
     Query<Car> query = db.<Car>createQuery(Car.class);
     CodeGenerator code = query.getCodeGenerator();
     code.predicate(code.and(code.and(code.and(code.in(code.field("color"), 
                                                       code.list(code.literal("black"), 
                                                                 code.literal("silver"))),
                                               code.lt(code.field("mileage"), 
                                                       code.literal(10000))),
                                      code.lt(code.field("price"),
                                              code.literal(20000))),
                             code.or(code.field("airCondition"),
                                     code.field("climatControl"))));
     code.orderBy("price", false);
     code.orderBy("model", true);
                                               
     for (Car car : query) 
     {
         ...
     }
The query should be constructed using the createQuery methods of the Storage or Database classes. Method Query.getCodeGenerator resets the query (clears all specified parameters, order by clauses, and query predicates), so that it is possible to call these methods several times for the same query. There are two versions of these methods: one takes the Class parameter, specifying the class for which the query will be constructed. Another, overloaded version takes no parameters, on the assumption that that the class was previously associated with the query using the Query.setClass method.

Using AspectJ

The main idea of aspect oriented programming (AOP) is to separate the different aspects of systems, implement them independently and then automatically combine the different aspects into a single program. AspectJ is a very popular tool which brings the ideas of AOP to the Java language.

Concerning the interface to a database system, there is the object persistence aspect which should control transparent object loading and storing to/from the storage. Without AspectJ, Perst uses a programmer controlled, recursive object loading mechanism (when some object is requested, all objects referenced from this object will also be fetched unless they redefine recusriveLoading method to return false. To store a modified object in a database, a programmer should explicitly call the store() or the modify() method. With AspectJ this is not needed: persistence aspect will automatically load referenced object and mark it as modified when values of some of the object’s fields are changed. Also, there is no need to derive all the persistence-capable classes from the Persistent base class; it can be easily done using the AspectJ declaration construction.

The AspectJ interface to Perst was developed by Patrick Morris-Suzuki. It is located in the package: org.garret.perst.aspectj. To build it, you can use perst/src/compile_aspectj.bat batch file (AspectJ should be properly installed before). It will build perst/lib/perst_aspectj.jar file. Each persistence-capable method must implement org.garret.perst.aspectj.AutoPersist, which is a marker interface with no methods. However, the user can avoid this by writing a simple aspect to define the persistence-capable classes like this:

public aspect PersistentClasses {
    declare parents: mypackage.* implements AutoPersist;
}
Then you can build your application like this:
ajc *.java -aspectpath /perst/lib/perst_aspectj.jar
Restrictions of AspectJ API to Perst:

  1. persistence-capable objects still need an empty default constructor (which will be used by a database to create an instance of an object when it is loaded from the storage).
  2. The hashCode() and equals() methods will work correctly only after an object is assigned its OID.
  3. The AspectJ interface is not able to automatically load objects when its fields are accessed from the outside. Hence, this interface correctly handles only the invocation of methods and accesses to self variables. While in principle, it is also possible to automatically handle access to foreign object components with AspectJ, it is done in a very inefficient way and as it contradicts good OOP practice, I switched off this feature. (You can switch it on by uncommenting one code fragment in the PerstistenceAspect file.)

In the perst/tst/aspectj directory there is an example of an application using AspectJ interface to Perst: Guess. It is the same example as "Guess an animal" located in the perst/tst directory (which does not use AspectJ), but it doesn't contain explicit calls to the store() method. Also, the original Guess example recursively loaded the while game tree when the root object is accessed; whereas the AspectJ version fetches object instances only on demand. You can build the examples using the compile.bat script and run them using the Guess.bat script.

Using JAssist

JAssist is yet another popular product for the transformation of Java byte code. It is also possible to build a transparent API for Perst using the JAssist package. Despite the fact that it is much simpler and more compact than AspectJ, this API makes it possible to implement an even more convenient and flexible Perst interface than AspectJ with fewer limitations.

With JAssist you can use a normal Java compiler. The Perst translator for JAssist automatically makes all classes matching specific name patterns persistence capable. With this translator, it is not required for the application classes to be derived from the Persistent class and provide an empty default constructor.

To be able to use the JAssist interface, you should use the special JAssist class loader instead of the default Java class loader to load all classes which you want to make persistence capable. Classes with fully qualified names matching one of the specified patterns will be transformed during loading. Other classes are loaded unchanged.

I have slightly modified the JAssist code to be able to distinguish access to the this field and to the fields of foreign objects. This makes it possible to eliminate extra overhead for the this field access. If you are going to use the original JAssist library, then you should comment my code using !isSelfReader() condition. In this case Perst will not be able to handle access to the foreign (non this) fields. You should use the getter/setter methods instead. Alternatively, you can replace this condition with true, allowing access to foreign fields but with significant degradation of performance and increased code size, because in this case before ALL accesses to fields of a persistence-capable object, a call to load() method will be inserted. The modified versions of the JAssist files are included in the Perst distribution in the perst/src/org/garret/jassist/edit directory.

Transformation includes the following steps:

  1. Add a special constructor which will be used by Perst to create an object instance when it is loaded from storage (it is not the default constructor which may or may not exist).
  2. If a class is not derived from the persistent class and its superclass is java.lang.Object, then, replace the superclass with org.garret.perst.Persistent.
  3. Insert a call to Persistent.load() method at the beginning of each instance method.
  4. Insert a call to Persistent.modify() method before each write to a field.
  5. Insert the recursiveLoading method returning false.
  6. Add pack/unpack methods to the class and create a load factory class.

The JAssist interface has only one restriction: you should not access the fields of any persistent object other than a this object in a constructor. Below is an example using the JAssist interface:

package com.mycompany.mypackage;
import org.garret.perst.jassist.PerstTranslator;
import javassist.*;
public class MyApp { 
    public static void main(String[] args) { 
        Translatator trans = new PerstTranslator(new String[]{"com.mycompany.*"});
        ClassPool pool = ClassPool.getDefault();
        Loader cl = new Loader(pool);
        cl.addTranslator(pool, trans);
        Thread.currentThread().setContextClassLoader(cl);
        cl.run("Main", args);
    }
}
In this example all the classes from com.mycompany.mypackage except MyApp will be loaded by the JAssist class loader and automatically made persistence capable.

To build the JAssist interface, you should use the compile-aspectj.bat file in the perst/src directory. The javassist.jar file is included in the Perst distribution and located in the perst/lib directory. You can find examples using the JAssist interface in the perst/tst/jassist directory.

Perst implementation issues

Below is a more detailed description of Perst implementation. This section can be skipped if you are not interested in the details of implementation.

Memory allocation

Memory allocation is performed in Perst by bitmaps. The memory is allocated in chunks called allocation quantum. In the current version of Perst, the size of the allocation quantum is 64 bytes. It means that the size of all allocated objects is aligned on the 64 byte boundary. Each 64 bytes of database memory is represented by one bit in the bitmap. To locate a hole of requested size in the bitmap, Perst sequentially searches the bitmap pages for the corresponding number of successive cleared bits. Perst uses three arrays indexed by bitmap byte, which makes possible fast calculation of the hole offset and the size within the byte.

Perst performs cyclic scanning of the bitmap pages. It keeps the identifier of the current bitmap page and the current position within the page. Every time an allocation request arrives, scanning of the bitmap starts from the current position. When the last allocated bitmap page is scanned, scanning continues from the beginning (from the first bitmap page) until the current position is reached. When no free space is found after a full cycle through all bitmap pages, a new block of memory is allocated. The size of the extension is the larger of the size of the allocated object and the extension quantum. The extension quantum is a parameter of the database, specified in the constructor. A bitmap is extended to be able to map additional space. If the virtual space is exhausted and no more bitmap pages can be allocated, then the OutOfMemory exception is raised.

Allocating memory using bitmaps provides a high locality of references (objects are mostly allocated sequentially) and also minimizes the number of modified pages. Minimization of the number of modified pages is significant when the commit operation is performed and all dirty pages should be flushed to the disk. When all cloned objects are placed sequentially, the number of modified pages is minimal and so, the transaction commit time is also reduced. Using the extension quantum also helps to preserve sequential allocation. Once a bitmap is extended, objects will be allocated sequentially until the extension quantum will be completely used up. Only after reaching the end of the bitmap, scanning restarts from the beginning, searching for holes in previously allocated memory.

To reduce the number of bitmap page scans, Perst associates a descriptor with each page, which is used to remember the maximal size of the hole on the page. Calculation of the maximal hole size is performed in the following way: if an object of size M cannot be allocated from this bitmap page, then the maximal hole size is less than M. Thus, M is stored in the page descriptor if the previous value of the descriptor is larger than M. For the next allocation of an object of size greater or equal to M, we will skip this bitmap page. The page descriptor is reset when some object is deallocated from this bitmap page.

Some database objects (like hash table pages) should be aligned on a page boundary to provide more efficient access. The Perst memory allocator checks the requested size and whether it is aligned on a page boundary. Then, the address of the allocated memory segment is also aligned on a page boundary. A search for a free hole will be faster in this case, because Perst increases the step size of the current position increment according to the value of the alignment.

To be able to deallocate memory used by an object, Perst needs to keep information about the object size somewhere. The Perst memory allocator deals with two types of objects - normal table records and page objects. All table records are prepended by a record header, which contains the record size and a pointer to an L2-list linking all records in the table. So, the size of the table record object can be extracted from the record header. Page objects always occupy the whole database page and are allocated at the positions aligned on a page boundary. Page objects have no headers. Perst distinguishes page objects from normal objects by using a special marker in the object index.

Shadow transaction mechanism

Each record (object) in Perst has a unique identifier (OID). Object identifiers are used to implement references between objects. To locate an object by reference, its OID is used as an index in the array of object offsets within the file. This array is called the object index and an element of this array - an object handle. There are two copies of object indices in Perst, one of which is current and the other, shadow. The header of a database contains pointers to both object indices and indicates which index is current at this moment.

When an object is modified the first time, it is cloned (a copy of the object is created) and the object handle in the current index is changed to point to the newly created object copy. The shadow index still contains the handle which points to the original version of the object. All changes are done with the object copy, leaving the original object unchanged. Perst makes a mark in a special bitmap page of the object index which contains the modified object handle.

When a transaction is committed, Perst first checks if the size of the object index was increased during the current transaction. If so, it also reallocates the shadow copy of the object index. Then, Perst frees memory for all "old objects", i.e. objects which were cloned within a transaction. Memory cannot be deallocated before a commit, because we want to preserve the consistent state of the database by keeping a cloned object unchanged. If we deallocate memory immediately after cloning, a new object can be allocated in place of the cloned object and we lose consistency. As far as memory deallocation is performed in Perst by bitmaps using the same transaction mechanism as for normal database objects, deallocation of object space will require clearing some bits in the bitmap page, which also should be cloned before modification. Cloning the bitmap page will require new space for allocating the page copy, and we can reuse the space of deallocated objects. However, it is not acceptable due to the reason explained above - we will lose database consistency. That is why deallocation of an object is done in two steps. When an object is cloned, all bitmap pages used for marking the object’s space are also cloned (if not cloned before). So when a transaction is committed, we only clear bits in the bitmap pages and no more requests for allocation of memory can be generated at this moment.

After deallocation of old copies, Perst flushes all modified pages to the disk to synchronize content of the memory and the disk files. After that Perst changes the current object index indicator in the database header to switch the roles of the object indices. Now the object index, which was current becomes shadow, and the shadow index becomes current. Then, Perst again flushes the modified page (i.e. page with the database header) to the disk, transferring the database to a new, consistent state. After that Perst copies all modified handles from the new object index to the object index which was previously shadow and now becomes current. At this moment, contents of both indices are synchronized and Perst is ready to start a new transaction.

The bitmap of the modified object index pages is used to minimize the time taken for committing a transaction. Not the whole object index, but only its modified pages should be copied after committing of the transaction bitmap is cleared.

When a transaction is explicitly aborted by the Storage.rollback method, the shadow object index is copied back to the current index, eliminating all changes done by the aborted transaction. After the end of copying, both indices are identical again and the database state corresponds to the moment before the start of the current transaction.

The allocation of object handles is done by the free handles list. The header of the list is also shadowed and two instances of the list header are stored in the database header. Switching between them is done in the same way as switching of object indices. When there are no more free elements in the list, Perst allocates handles from the unused part of a new index. When there is no more space in the index, it is reallocated. An object index is the only entity in a database which is not cloned on modification. Instead of this, two copies of the object index are always used.

There are some predefined OID values in Perst. OID 0 is reserved as the invalid object identifier. OIDs starting from 1 are reserved for bitmap pages. The number of bitmap pages depends on the database’s maximum virtual space. For one terabyte virtual space, 8 kb page size and 64 byte allocation quantum, 32 K bitmap pages are required. So, 32 K handles are reserved in the object index for the bitmap. Bitmap pages are allocated on demand, when the database size is extended. Hence, the OID of the first user object will be 0x8002.

The recovery procedure is trivial in Perst. There are two instances of object index, one of which is current and the other corresponds to the consistent database state. When a database is opened, Perst checks the database header to detect if the database was closed normally. If not (the dirty flag is set in the database header), then Perst performs a database recovery. Recovery is very similar to the rollback of a transaction. The indicator of the current index in a database object header is used to determine the index corresponding to the consistent database state and the object handles from this index are copied to another object index, eliminating all changes done by the uncommitted transaction. As long as the only action performed by the recovery procedure is copying of an object’s index (in reality only handles having different values in the current and shadow indices are copied to reduce the number of modified pages) and the size of the object index is small, a recovery can be performed very fast. Fast recovery procedures reduce the "out-of-service" time of an application.

Where should Perst be used?

Perst is a simple and fast embedded database engine. If your applications need an embedded database engine and do not need to execute complex SQL queries, and the only thing you need is to be able to store/fetch/locate objects in the database using navigation through references or indexed search by keys, then Perst is what you need. It will provide much better performance than a relational database and other (more sophisticated) object-oriented databases.

The table below summarizes the special features of Perst:

  1. Tight and transparent integration with the programming language.
  2. No gap between the database and the application data model.
  3. Easy to use.
  4. Minimal requirements (The Perst package itself is only 51 kb in size and it can be configured to use minimal memory and disk space during its work)
  5. High performance (no overheads of communication, locking, parsing and executing queries).
  6. Fault tolerance (transaction support).
  7. Fast recovery after fault.
  8. Zero administration effort (database consists of the single file), there is no need to define or tune any database parameters. A situation like a transaction log overflow can never happen.
Certainly nothing in this world can have only positive features. Perst lacks the following features:
  1. A procedural query language.
  2. Remote access by multiple clients (unless you will implement you own server).
  3. Data distribution.
  4. Lack of support for any standard (for example, ODMG).

Quick start

Perst is distributed together with the perst.jarbuild. You can also build it yourself using the compile.bat script in the src directory. The only thing you need to work with the database is to include this JAR in your class path. A template for a Perst application is as follows:

import org.garret.perst.*;

public class YourPersistentClass extends Persistent {
    int    x;
    String y;
    Link   links;
    ...


    void doUpdate() { 
        x = 1;
        y = "Hello World";
        links = getStorage().createLink();
        store(); // save changes in the database
    }
}

public class YourRoot extends Persistent {
    YourPersistentClass ref;
    Index myIndex;

    void foo() { 
        ref.doUpdate(); // referenced object has no to be explictely loaded
    }

    
};


public class YourApplication {
    static public void main(String[] args) { 
        String databaseFilePath = args[0]; // path to the database file
        int pagePoolSize = Integer.parseInt(args[1], 10); // size of page pol in bytes

        Storage storage = StorageFactory.getInstance().createStorage();
        storage.open(databaseFilePath, pagePoolSize);

        YourRoot root = (YourRoot)storage.getRoot();
        if (root == null) { 
            // Storage was not initialized yet
            root = new YourRoot();
            root.myIndex = storage.createIndex(String.class, true); // unique index
            storage.setRoot(root);
        }

        root.foo(); // all objects referenced from root are implicitly loaded
        YourPersistentClass obj = root.myIndex.get(new Key("John")); // find object with key "John"
        
        ... // do something else with database
        
        storage.commit(); // commit transaction

        ... // do something else with database

        storage.close(); // commit the last transaction and close the database
    }
}

You can also look at these Perst examples:

Simple.java
This example illustrates the basic operations with Perst storage
Guess.java
A very simple game: "Guess An Animal". This program stores a user's answers in a tree structure and uses this information to construct its questions.
TestIndex.java
Test for a B+Tree. This test stores, finds and deletes objects to the index and also measures performance.
TestIndex2.java
Test for a T-Tree. The same test as TestIndex.java but implemented using SortedCollection (T-Tree).
TestKDTree.java
Test for multidimensional index (KD-Tree). This test stores, finds and deletes objects to the multidimensional index and also measures performance. It illustrates use of Query-By-Example search using multidimensional index.
TestCompoundIndex.java
Test for a B+Tree compound indices.
TestIndexIterator.java
Test for iterators of a B+Tree.
TestLink.java
Test of relation handling between objects in Perst using the Link interface.
TestSSD.java
Supplier-Shipment-Detail example. This example illustrates how joins can be implemented in Perst.
TestJsqlJoin.java
Test of index joins in Database class (used when JSQL predicate contains indirect access to the fields, like "album.label.name=?")
TestCodeGenerator.java
Test of code generator for query predicates
TestAutoIndices.java
Test automatic creation of indices for fields used in JSQL query predicates
JsqlSSD.java
Supplier-Shipment-Detail example. The same data model as in the example above, but implemented using JSQL and Database class.
TestVersion.java
Test of versioning support. This test illustrates usage of the Version and VersionHistory classes and their performance.
TestSOD.java
Supplier-Order-Detail example. This example illustrates alternative approach for implementing many-to-many relations based on using the Projection class.
TestRtree.java
Test of the spatial index.
TestR2.java
Test of the R2 spatial index.
TestTtree.java
Test of sorted collection.
TestBit.java
Test of bit index.
TestRaw.java
Example of serialization of standard Java collections.
TestBlob.java
Test and example of handling large binary objects in Perst.
TestAlloc.java
Test of custom allocator and multifile.
TestTimeSeries.java
Test and example of handling time series data.
TestXML.java
Test of XML import/export.
TestBackup.java
Test of database online backup.
TestConcur.java
Test of the Perst locking mechanism and concurrent access to the database.
TestServer.java
Test of concurrent access to the database using serializable per-thread transactions.
TestDbServer.java
Test of concurrent access to the database using Database wrapper instead of native API.
TestGC.java
Test of implicit memory deallocation (garbage collection) in Perst.
TestThinkIndex.java
Test of indices with a large number of duplicated key values.
TestSet.java
Test of IPersistentSet.
TestList.java
Test of IPersistentList.
TestRndIndex.java
Test of the random access index (access to elements both by key and position).
TestMod.java
Example of updating persistent objects and maintaining indices for them.
TestMap.java
Test of the persistent map class.
TestReplic.java
Test of the Perst replication mechanism (static slave nodes).
TestReplic2.java
Test of the Perst advanced replication mechanism for a main-memory database (dynamic slave nodes).
TestJSQL.java
Test of JSQL.
TestFullTextIndex.java
Test of full text search capabilities.

The tests are located in the tst directory and can be built using the compile.bat script.

JDK 1.4

Before version 4.0 of Perst the default Perst API was for JDK 1.4. Starting from version 4.0 the default Perst API is for JDK 1.5 and higher. So now src contains Perst sources for JDK 1.5 with support of generic classes. Also perst.jar now corresponds to JDK 1.5 version. If you need version of Perst for JDK 1.4, please use perst14.jar which sources are now in src14 directory.

Perst Lite

Many embedded systems are based on a restricted Java environment - JDK 1.1, J2ME . . . . Perst provides a version of the database for such environments. The JDK 1.1 version of Perst is compatible with CLDC 1.1. It does not use reflection, serialization and the JDK 1.2 collection classes. The main differences between the mainstream and JDK 1.1 versions of Perst (Perst Lite) are the following:

Things that are missing in Perst Lite:

  1. Orthogonal persistence: possibility to store in the database classes not implementing IPersistent interface
  2. Compound indices for random access and alternative B-Tree
  3. Memory usage information
  4. File locking
  5. Multi-client access to the database
  6. User-defined class loaders
  7. Automatic schema evolution
  8. Custom allocators
  9. Multifiles

Changes in functionality:

  1. The pack/unpack methods must be provided by the programmer or generated by SerGen utility.
  2. The persistence-capable classes must have a public default constructor.
  3. Object cache is resetted after the transaction commit (*)
  4. The number of objects involved in a transaction is limited by the amount of physical memory (*).

(*) only if weak references are not supported by target environment. Weak references are avaialble in CLDC1.1

Perst Lite does not use the Java reflection mechanism. So, to be able to store/fetch persistent objects to/from the database, the programmer must provide special pack/unpack routines. Perst provides the ISerializable interface, which defines the following methods:

/**
 * Interface which should be implemented by any object that can be serialized.
 * Implementation of writeObject readObject methods should 
 * store/fetch content of the object to/from the provided stream.
 * The order of storing/fetching fields should be the same in both methods.
 */
public interface ISerializable { 
    /**
     * Store object fields in the stream.
     * @param out stream to which object is written
     */
    void writeObject(IOutputStream out);    
    /**
     * Fetch object fields from the stream
     * @param in stream from which object data should be fetched
     */
    void readObject(IInputStream in);

    /**
     * Check if objects can be pinned in cache to survive transaction commit.
     * Commit performs cleanup of strong object cache, leaving only limited number
     * of most frequently used objects (pin objects). 
     * It is not possible to pin object which contains references to other persistent objects,
     * since such reference can point to the object which thrown away from the cache.
     * @return if object can be pinned in memory (object contains no referenes)
     */
    boolean isPinnable();
}

So, each persistence-capable class should implement this interface (actually it should be derived from the org.garret.perst.Persistent class which provides the basic implementation of this interface) and implement the writeObject/readObject methods. The interfaces IOutputStream and IInputStream provide methods for storing and fetching respectively, values of different types. Below are example definitions of pack/unpack methods:

public class Test extends Persistent { 
    int    i;
    float  f;
    String s;
    Test   o;

    public writeObject(IOutputStream out) {
        super.writerObject(out);
        out.writeInt(i);
        out.writeFloat(f);
        out.writeString(s);
        out.writeObject(o);
    }

    public void readObject(IInputStream in) {
        super.readObject(in);
        i = in.readInt();
        f = in.readFloat();
        s = in.readString();
        o = (Test)in.readObject();
    }
}
Also, because of restrictions on the JDK 1.1 reflection mechanism, a persistence-capable class should be:

Self-made reflection:

CLDC 1.1 doesn;t support reflection. But to be able to provide JSQL query language and field indices, Perst needs to use reflection mechanism. So if runtime information about classes is not provided by Java runtime, it has to be explicitely provided by programmer. Perst.Lite include org.garret.perst.reflect package which is replacement of standard Java java.lang.reflect package. But descriptors of class, fields and methods should be created by programmer or generated by SerGen utility. Reflection support as well as Perst classes using reflection are provided in perst-rms-reflect.jar and perst-jsr75-reflect.jar archives. Reflection support adds about 100kb to jar file size. Them should be useed only if you need JSQL or field indices. below is an example of describing class format using org.garret.perst.reflect package:
import org.garret.perst.*;
import org.garret.perst.reflect.*;

import java.util.Date;

class MyClass extends Persistent { 
    int i;
    String  s;
    MyClass next;

    void foo(int x) { 
        // some user's code...
    }

    String f(Date d, float f) { 
        // some user's code...
    }

    int g() { 
        // some user's code...
    }

    // It is necessary to create instance of Type class to register it in reflection provider
    public static final Type TYPE = 
        return new Type(MyClass.class, // described class
                        Persistent.class, // basse class
                        new Class[]{}, // implemented interfaces
                        new Field[] {  // list of the fields                            
                            new Field(Integer.class, // field type: J2ME doesn't provide classes for builtin types, 
                                                     // so we have to use Integer.class instead of int.class
                                      "i",  // field name
                                      Modifier.Indexable|Modifier.Unique) // field modifiers used by Database class
                                      {
                                           // Getter method
                                           public Object get(Object obj) { 
                                                return new Integer(((MyClass)obj).i); 
                                           }
                                           // Setter method
                                           public void set(Object obj, Object value) { 
                                                ((MyClass)obj).i = ((Integer)value).intValue(); 
                                           }
                                           // It is not necessary to redefine getInt/setInt methods 
                                           // because default implementation will use set/get method, 
                                           // but certainly such implementation is more efficient
                                           public int getInt(Object obj) { 
                                               return ((MyClass)obj).i; 
                                           }
                                           public void setInt(Object obj, int value) { 
                                               ((MyClass)obj).i = value; 
                                           }
                                      },
                            new Field(String.class, "s", Modifier.FullTextIndexable) {
                                public Object get(Object obj) { return ((MyClass)obj).s; }
                                public void set(Object obj, Object value) { ((MyClass)obj).s = (String)value; }
                            },
                            new Field(MyClass.class, "next") {
                                public Object get(Object obj) { return ((MyClass)obj).next; }
                                public void set(Object obj, Object value) { ((MyClass)obj).next = (MyClass)value; }
                            },
                        }, 
                        new Method[] { // list of the methods   
                            new Method(null, // type of void method shoudl be null
                                       "foo", // method name
                                       new Class[]{Integer.class}) // method parameter types   
                                       { 
                                           // Invocation method   
                                           public Object invoke(Object obj, Object[] args) { 
                                               ((MyClass)obj).foo(((Integer)args[0]).intValue()); 
                                               return null; // In case of void method we should return null 
                                           }
                                       },
                            new Method(String.class, // method return type
                                       "f", // method name
                                       new Class[]{Date.class, Float.class}) // method parameter types   
                                       {
                                           public Object invoke(Object obj, Object[] args) { 
                                                return ((MyClass)obj).f((Date)args[0], ((Float)args[1]).floatValue()); 
                                           }
                                       },
                            new Method(Integer.class, "g", new Class[]{}) {
                                public Object invoke(Object obj, Object[] args) { return new Integer(((MyClass)obj).g()); }
                            },
                        }) 
                        {
                            // Method creating instance of the class (default implementation uses
                            // java.lang.Class.newInstance, but providing specialized implementation 
                            // allows to instantiate non-public classes with non-public default constructor
                            public Object newInstance() { 
                                return new MyClass(); 
                            }
                        };
    }
}
Please look at JSQLTest1 and JSQLTest2 MIDP examples in perst/tst11/midp illustrating use of reflection, JSQL and Database class in Perst Lite.

Perst Lite MIDP storage support

A native storage subsystem of the Mobile Information Device Profile (MIDP) is the Record Management System (RMS), an API that gives MIDP applications local, on-device data persistence. On many MIDP enabled devices today, such as Blackberry devices, RMS is the only facility for local data storage. Other devices support a conventional file system through optional packages. Perst Lite supports both the RMS and extended configurations. Perst Lite is built to support three storage configurations: JDK 1.1 with the standard file system support (perst11.jar), MIDP/CLDC 1.1 (RMS storage only, perst-rms.jar) and MIDP/CLDC 1.1 with the optional JSR 75 package (perst-jsr75.jar). Packages perst-rms-reflect.jar and perst-jsr75-reflect.jar also provides reflection support and Perst stuff based on reflection: field indices, JSQL query language, RDMS facade class. There is also separate package with XML import/export support: perst11xml.jar which can be used in conjunction with any Perst Lite package to provide way of exporting/importing Perst Lite database to/from XML.

Note for Blackberry users:

JSR 75 performance issues

JSR 75 makes it possible to work with files at the deive file system. But it is oriented mostly on sequential access to the data: playing music or video, displaying text,... JSR 75 provides no way to read from the arbitrary position in the file except using java.io.InputStream.skip method which is very naively implemented in some J2ME runtimes: it just reads data from the stream bye-by-byte until reaches the requested position. For example at Nokia 6110 (S60 3-rd edition) performing 10000 index searches through 10000 records takes ... more than 3 hours! And the database size in this example is only about 600kb. So this size can easily be loaded in memory.

That is why Perst Lite includes special version of PagePool - InfinitePagePool which preloads the whole database file in memory. It was also possible with mainstream Perst version to use infinite page pool. But this implementation of infinite page pool for Perst Lite is much more efficient and requires less memory. But the main difference is that it loads the while database in memory during open. With this implementation search time of the test mentioned above is about 8 seconds Compare it with 3 hours!

Importing data to the Perst Lite database

Usually applications needs to load some initial data in the database. Since performance of mobile phones and similar devices where Perst Lite is used is very limited, import may require significant amount of time. This is why it may be very useful to prepare database at PC and upload it to the target device.

With JSR 75 it is more or less straightforward - you should create database file at the PC using perst11.jar or any simulator which stores files created using JSR 75 API in as normal files in PC file system. But JSR 75 is not available everywhere. And format of RMS storage is opaque and vendor specific. But it is possible to import data in the RMS record storage at low level (as Perst pages) - it is much faster then just inserting data in the database (because there no need to construct index, allocate space, pack objects,...). Implementation of org.garret.perst.impl.RmsFile on top of RMS includes static importData method which can be perform such import from the arbitrary input stream:

    /**
     * Import data from specified input stream to record store
     * @param storeName storage name prefix. If there is limitation for storage size, then several storages with 
     * this prefix and suffixes 0, 1, 2,... will be created.
     * @param in input stream. Data can be imported from resource file (jar file), from network connection or from other
     * data source
     * @return true if data was successfully imported, false if storage is not empty or read operation from input stream
     * is failed
     */
    public static boolean importData(String storageName, InputStream in) throws IOException, RecordStoreException;
Using this methods it is possible to prepare database at PC database file (using emulator or perst11.jar library), include it in the midlet .jar file as resource file and import this file to the RMS storage:
    protected void startApp()
    {
        storage = StorageFactory.getInstance().createStorage();
        org.garret.perst.impl.RmsFile.importData("testindex.dbs", getClass().getResourceAsStream("/testindex.dbs"));
        storage.open("testindex.dbs");
        ...
    }
The importData method imports the resource file only when RMS storage is empty. So the is no need to programmer to perform extra checking - if database is initialized or not.

It is possible to use importData method not only for importing resource files from .jar file but also for importing data from any data source, for example from network connection.

SerGen utility

Since the manual description of all class components can be difficult and error prone work, Perst Lite provides a special utility called SerGen which can do this work for you, i.e. generate the writeObject/readObject methods. The only thing you have to do is to add stubs for the writeObject/readObject methods in the persistence-capable classes and specify the path to them as arguments to the SerGen utility. It is necessary to specify stubs in the list of persistence-capable classes so that the SerGen utility does not have to load the whole project, resolve all class paths and source locations, and try to guess the right place and indentation of where to insert these generated methods. Rather, SerGen uses a simplified solution that only requires the programmer to explicitly specify the classes and stubs for the writeObject/readObject/getClassDescriptor methods. It solves both problems: SerGen does not have to guess which classes are persistence capable or where to generate the bodies of these methods.

If you do not want SegGen utility to replace writeObject/readObject/getClassDescriptor methods explicitly written by you in some particular class you can mark them with /* @sealed */ comment. In this case SerGen will leave them untouched.

The getClassDescriptor should be specified if you are going to use reflection (field indices, JSQL queries,...). You can also mark indexable fields for database class using @modifier tag in comments. Suported modifiers are enumerated in org.garret.perst.Modifier class. These modifiers should be specified after @modifier tag separated by space or commas:

    /**
     * Some integer key field
     * @modifier Indexable, Unique
     */
    int key;
To make Javadoc tool correctly proceed this tag, you should add -tag modifier:f:"Modifiers:" option to javadoc tool.

The SerGen utility is invoked as follows:

      java SerGen list-of-files
Here, list-of-files is a list of source files (*.java) or directories where these source files are located. SerGen recursively traverses these directories, selects files with the .java extension and tries to locate the writeObject/readObject methods in them. If the methods are located, then the file is patched: SerGen first generates a XXX.java.tmp file, removes the original file, and renames XXX.java.tmp to XXX.java. The paths of all the patched files are reported on the console output.

This is the example of the original file (written by programmer):


import org.garret.perst.*;
import org.garret.perst.reflect.*;

import java.util.Date;

class MyClass extends Persistent { 
    /**
     * Some integer field
     * @modifier Indexable, Unique
    int     i;

    /**
     * Some string fieldx
     * @modifier FullTextIndexable
     */
    String  s;

    MyClass next;

    void foo(int x) { 
        // some user's code...
    }

    String f(Date d, float f) { 
        // some user's code...
    }

    int g() { 
        // some user's code...
    }


    public void writeObject(IOutputStream out) {
    }

    public void readObject(IInputStream out) {
    }

    private static Type getClassDescriptor() {
        return null;
    }
    
    public static final Type TYPE = getClassDescriptor(); 
}
And this the result of applying SerGen utility to this file:
import org.garret.perst.*;
import org.garret.perst.reflect.*;

import java.util.Date;

class MyClass extends Persistent { 
    /**
     * Some integer field
     * @modifier Indexable, Unique
     */
    int     i;

    /**
     * Some string field
     * @modifier FullTextIndexable
     */
    String  s;

    MyClass next;

    void foo(int x) { 
        // some user's code...
    }

    String f(Date d, float f) { 
        // some user's code...
    }

    int g() { 
        // some user's code...
    }


    public void writeObject(IOutputStream out) {
        super.writeObject(out);
        out.writeInt(i);
        out.writeString(s);
        out.writeObject(next);
    }

    public void readObject(IInputStream in) {
        super.readObject(in);
        i = out.readInt();
        s = out.readString();
        next = (MyClass)out.readObject();
    }

    private static Type getClassDescriptor() {
        return new Type(MyClass.class, Persistent.class, new Class[]{}, new Field[]{
            new Field(Integer.class, "i", Modifier.Indexable|Modifier.Unique) {
                public Object get(Object obj) { return new Integer(((MyClass)obj).i); }
                public void set(Object obj, Object value) { ((MyClass)obj).i = ((Integer)value).intValue(); }
                public int getInt(Object obj) { return ((MyClass)obj).i; }
                public void setInt(Object obj, int value) { ((MyClass)obj).i = value; }
            },
            new Field(String.class, "s", Modifier.FullTextIndexable) {
                public Object get(Object obj) { return ((MyClass)obj).s; }
                public void set(Object obj, Object value) { ((MyClass)obj).s = (String)value; }
            },
            new Field(MyClass.class, "next") {
                public Object get(Object obj) { return ((MyClass)obj).next; }
                public void set(Object obj, Object value) { ((MyClass)obj).next = (MyClass)value; }
            },}, new Method[]{
            new Method(null, "foo", new Class[]{Integer.class}) {
                public Object invoke(Object obj, Object[] args) { ((MyClass)obj).foo(((Integer)args[0]).intValue()); return null; }},
            new Method(String.class, "f", new Class[]{Date.class, Float.class}) {
                public Object invoke(Object obj, Object[] args) { return ((MyClass)obj).f((Date)args[0], ((Float)args[1]).floatValue()); }},
            new Method(Integer.class, "g", new Class[]{}) {
                public Object invoke(Object obj, Object[] args) { return new Integer(((MyClass)obj).g()); }},}) {
            public Object newInstance() { return new MyClass(); }
        };
    }
    
    public static final Type TYPE = getClassDescriptor(); 
}

SerGen supports custom serializers, with three different schemas triggered by the field annotations @serializable, @binary and @property.

Field types marked with the @serializable annotation implement the org.garret.perst.ISerializable interface. The field’s value is serialized inside the object containing this field using the readObject/writeObject methods of the ISerializable interface. SerGen generates the following code in the pack/unpack method for such fields:

class Diagnosis {
    /**
     * Unique identifier
     * @serializable
     */
    UUID id;
 
    public void writeObject(IOutputStream out) {
        super.writeObject(out);
        id.writeObject(out);
     } 
 
    public void readObject(IInputStream in) {
        super.readObject(in);
        id = new UUID(); id.readObject(in);
    }
}
Fields marked as @binary are also serialized using their own method of this class. But in this case, the method toByteArray is used instead of readObject. And classes of this field are intended to have constructors with a byte array parameter. So @binary should be used if the goal is to avoid adding dependency on Perst code to a class, because serialization is done through a standard byte array.

There is yet another difference with the @serializable annotation. Types of this field returned by the Perst reflection mechanism are byte[], and SerGen generates the code to convert objects to/from byte arrays. So it is possible to insert this field in indices (which is especially convenient when using the Database class - the field can simply be marked as indexable). The field will be inserted into an index as a byte array key, and byte arrays can be used to locate the object.

Below is code generated by SerGen for a @binary field:

class Diagnosis {
    /**
     * Unique identifier
     * @binary
     * @modifier Unique,Indexable,FixedSize
     */
    UUID id;
 
    public void writeObject(IOutputStream out) {
        super.writeObject(out);
        out.writeArrayOfByte(id.toByteArray());
     } 
 
    public void readObject(IInputStream in) {
        super.readObject(in);
        id = new UUID(in.readArrayOfByte());
    }
    private static Type getClassDescriptor() {
        return new Type(Diagnosis.class, Persistent.class, new Class[]{}, new Field[]{
            new Field(byte[].class, "id", Modifier.Indexable|Modifier.Unique|Modifier.FixedSize) {
                public Object get(Object obj) { return ((Diagnosis)obj).id.toByteArray(); }
                public void set(Object obj, Object value) { ((Diagnosis)obj).id = new UUID((byte[])value; }
            },
        };
    }    
    public static final Type TYPE = getClassDescriptor(); 
}
Note the FixedSize modifier. It informs Perst that the byte array representing the key value always has the same length. It allows use of a more efficient B-Tree implementation when dealing with varying-length keys.

If it is not possible or desirable to change the sources of the class whose value is stored in the field, the @property modifier can be used. In this case, the declaring class should provide getXXX(IInputStream in) and setXXX(IOutputStream out, XXX value) where XXX is the type of the field, as shown below:

class Diagnosis {
    /**
     * Unique identifier
     * @property
     */
    UUID id;
 
    public void writeObject(IOutputStream out) {
        super.writeObject(out);
        setUUID(out, id);
     } 
 
    public void readObject(IInputStream in) {
        super.readObject(out);
        id = getUUID(in);
    }
 
    // These methods are written by the programmer
    private void setUUID(IOutputStream out, UUID id) {
        out.writeLong(id.getTime());
        out.writeLong(id.getClockSeqAndNode());
    }
 
    private UUID getUUID(IInputStream in) {
        return new UUID(in.readLong(), in.readLong());
    }
}

DBConv utility

The DBConv utility performs offline compression/decompression/encryption/decryption of an existing database file. Arguments of this utility are the following:
    java -classpath dbconv.jar DBConv [-encrypt KEY | -decrypt KEY] [-inflate|-deflate] DATABASE-FILE(s)
If you determine that your database has grown too large, or you decide to encrypt the database, you can use this utility. But do not forget to update your application to incorporate the encryption and/or compression process, as well.

Encryption is performed in the same way by both the standard Perst, and Perst Lite for Java ME. This utility can be used with both for encryption/decryption. However, the format of the compressed database is different between standard Perst and Perst Lite. Perst Lite supports compression only when using the RMS (record management storage) package. And this utility can perform only compression/decompression (not encryption) for Perst Lite.

This utility is especially useful for transferring a compressed database between a PC and a target device. As mentioned above, Perst Lite supports compression only when using RMS. But the RMS package can be used on a PC only under the device emulator. If your goal is to use the perst11.jar package to prepare a Perst database to be imported onto a mobile device, or to access a database that is exported from mobile device, you should use the DBConv utility to compress the database before transferring it to the device, and to decompress the database after transferring it from the device. An uploaded compressed database can be imported into RMS storage using the org.garret.perst.impl.RmsFile.importDatabase method. And to transfer the database back to the PC, use the org.garret.perst.impl.RmsFile.exportDatabase method.

Weak references and Perst Lite

Originally, Perst.Lite was created for CLDC 1.0. Because CLDC 1.0 did not support weak references, all persistent objects accessed by a transaction were pinned in memory until the end of the transaction. A drawback was that with large transactions, this could lead to memory overflow. Also, in this approach, the object cache is cleaned upon transaction commit. Some programmers store a reference to the root object when the database opens, and use this variable to access the root object. This sometimes presented a problem after object cache cleanup: the variable would be gone, resulting in a bug. (Since the object cache is cleaned, it is impossible to use ANY reference to persistent object obtained in the previous transaction.)

Such a problem occurs only with a strong object cache, as opposed to weak object cache. By default, if the java.lang.WeakReference class is available, Perst Lite uses a weak object cache. CLDC 1.1 supports weak references, so in CLDC 1.1, Perst Lite always uses a weak object cache. In this case, it is unnecessary to pin all objects in the cache throughout the transaction. Only strong referenced and modified objects are pinned. However, please notice that since CLDC 1.1 doesn't support a finalization mechanism (which would enable a garbage collector to handle deallocation of objects), Perst Lite has to hold all modified objects in memory. So memory overflow is still possible if a transaction updates or inserts too much objects. To avoid this, limit the transaction size, for example, by splitting a large transaction into several smaller ones, when possible.

Weak object caching eliminates the problem of dangling references to the root object (or any other persistent object). If an object is referenced from some variable, it will stay in the object cache until there are no more strong references to this object. It may cause another problem - if persistent object A contains reference to persistent object B and A is referenced from some program variable and so pinned in object cache, then if B will be also pinned in object cache (since it is also strong referenced). So conceivably, if all objects in the database are directly or indirectly referenced from some root object(s) and these root object(s) are pinned in memory, then the whole database may be pinned in memory. To prevent such scenario it is recommended to use Perst collection classes instead of self-made collections based on direct references (like linked lists, for example).

Implementation of the weak object cache for Perst Lite is located in a separate directory (to make it possible to build the Perst library without using weak references): perst/src11/weak/org/garret/perst/impl/LruObjectCache.java. If you are going to build your own version of the Perst library including some subset of Perst classes, please do not forgot about this class.

Please note that in the case of a rollback you still have to ignore all references to persistent objects stored in program variables; it doesn't matter whether weak or strong cache is used. But there is the possibility in Perst to reload modified objects during transaction rollback and so in some cases be able to reuse the values of local variables.

Setting the "perst.reload.objects.on.rollback" property instructs Perst to reload all objects modified by the aborted (rolled back) transaction. It takes additional processing time, but in this case it is not necessary to ignore references stored in variables, unless they point to the objects created by this transactions (which were invalidated when the transaction was rolled back). Unfortunately, there is no way to prohibit access to such objects or somehow invalidate references to them. So this option should be used with care.

Full text search

Starting from version 3.0 Perst contains builtin full text search engine. Full text search becomes very important addition to the traditional DBMS functionality since in most of the cases of dealing with text data (address, book title and author, composition name, ...) it is more convenient to use full text search instead of making user to specify exact phrase or it's prefix. The fact that full text search is not widely used in most of the systems is mostly explained by lack of compact and efficient full text search engines. Especially it is important for embedded systems, such as Mp3 players, TV-Set top boxes,...

Perst provides very simple and still powerful and flexible full text search engine. A general search engine consists of the following components:

Perst search engine assumes that lemmatizer and stemmer will be provided by user (since them depends on the supported languages and document formats). Default implementation just extract from the document texts words as sequences of letters and digits and performs no stemming at all. Case of letters is obviously ignored.

Perst support basic logical operations in search query: AND, OR, NOT, for example (MultidimensionalIndex OR KDTree) AND NOT RTree. Names of this operators can be redefined in FullTextSearchHelper class. To perform strict match of the phrase it is necessary to put in quotes: "to be or not to be". It is actually all concerning syntax of full text query - you can see, that it is very simple. As far as there may be very large number of documents matching specified query, locating all of them may take a lot of time and user is rarely need to get all these results (in most cases user never look at more than first ten results). This is why Perst search engine allows to restrict number of returned documents and search time. But even if search engine doesn't return all matched documents, it is still able to provide estimation of this number.

Customization and tuning of full text search in Perst are performed using FullTextSearchHelper class. This class contains methods for parsing query, splitting document text in tokens, stemming of the words and adjusting search parameters. The org.garret.fulltext.FullTextSearchHelper class provides default implementation of these methods. If user want to change default behavior, he should create his own class derived from FullTextSearchHelper and override correspondent methods in it.

I want to say a little bit more about tuning search parameters. Perst calculates document's rank using the following criteria:

  1. Word frequency (number of occurrences of word in the particular document)
  2. Occurrence kind (in-title, in-header, emphased, in-bold, in-comments,...). Occurrence kinds depends on document text format and lemmatizer. Perst just treats occurrence kinds as integer numbers: starting from 0 and up to 7. Interpretation of meaning of this number depends on application. It should return weight of each occurrence kind in FullTextSearchHelper.getOccurrenceKindWeights() method. Word can belongs to several categories at the same time (for example in-title and in-bold). Perst associates with each word bit mask of occurrence kinds (type of mask is byte, that is why number of different occurrences is limited by 8). Default lemmatizer doesn't specify any occurrence kind (mask is zero) and default weight of word occurrence is 1. When you are specifying weights for your own occurrence kinds, please make them greater than 1 if you assume that this occurrence kind is more important than default word occurrence (for example in-title) or less than 1 if them are less significant than default occurrences ( for example in-comments).
  3. Inverse document frequency (IDF). Perst calculates number of documents relevant for each word of the query and assign higher weights to less frequent word.
  4. Words nearness. If we are searching for "Santa Claus", then most like we do not want to get document at the beginning of which there is reference to "Santa Barbara" and at the end it is signed by "Claus Petersen". This where nearness of query words in document should be taken in the account. Perst allows to vary significance of nearness criteria in total document rank by FullTextQueryHelper.getNearnessWeight() method. Perst uses the following formula to calculate rank of the document: (keywordRank*(1 + nearness*nearnessWeight)). Default value of nearnessWeight is 10. Make it smaller if nearness of words in you application is not so important (users mostly search independent notions). You can also make it possible for user to adjust this parameter dynamically (if implementation of getNearnessWeight() method in your own class supports it.
  5. Order of words in the text. Words can be quite close to each other, by the order may be different, producing completely different meaning. For example documents containing phrases "The computer is powered by Pentium 4 processor" and "This server can contain up to 4 Pentium processors" have difference relevance for query "Pentium 4". Perst allows to specify penalty of swapped word order using FullTextQueryHelper.getWordSwapPenalty() method. Default value is also 10. But here it means that calculated distance between words will be multiplied by this penalty. In case of the example above, the distance between words "Pentium" and "4" will be the same for all documents - 1 (it is minimal distance which leads to the highest nearness criteria value). By multiplying this distance on 10, we get result distance 10, so nearness for the second phrase will be the same as for phrase "Pentium Pro CPU, 4 Gb RAM".

Full text index in Perst is just yet another collection class as well as B-Tree, KD-Tree and other indices. To create full text index you should use Storage.createFullTextIndex method. There are two overloaded methods with this name, one allows to specify FullTextSearchHelper parameters, another - using default helper class.

To insert object in full text search index, you should either explicitly specify stream with object text and language of this text (language may be needed to correctly perform stemming, default Perst implementation doesn't perform stemming and so language is ignored), either object should implement FullTextSearachable interface and provide text and text language itself.

Full text queries can be specified just as string (in this case Perst will perform parsing of the query using provided helper class) or in form of a tree (later will be useful in case of advance search form, will allows user to combine AND/OR predicates). More information on full text search API can be found in generated Perst API documentation. Also please look at tst/TestFullTextIndex.java example.

JDK 1.5 version of Perst also allows to easily use full text search in Database class. In this case it is just enough to mark text fields of the objects which should be included in full text search index using FullTestIndexable annotation. When object of such type is stored in the database, Perst will concatenate string values of all this fields ( method is used to convert object value to string) and inserts object in full text index. alternatively object may implement FullTextSearchable interface and be self responsible for extracting it's text. Unlike indexed fields of the object included in normal indices, which update should be handled by user by invocation of excludeFromIndex prior to the update and includeInIndex after the update, in case of update of fields included in full text search index, it is enough to invoke Database.updateFullTextIndex method after the update. You can find example of using full text index with Database class in tst/TestDbServer.java.

Perst+Lucene

Although Perst provides its own full text search capabilities, it is also possible to integrate Perst with the free open source Lucene search engine. By default, Lucene stores the full text search index in the file system. But it is also possible to store the Lucene index in a Perst database (using the Perst Blob class). Lucene works with an inverse index using the Directory interface. Default implementations of this interface include a file system and in-memory directory. Perst provides its own implementation of the Lucene Directory interface allowing to store the index directly in a Perst database.

The advantages of storing a Lucene index inside a Perst database are the following:

  1. Use a single data storage for persistent objects, normal (B-Tree) indices and Lucene full text search indices. It allows to administer only one file, which significantly simplifies copy/backup/restore operations.
  2. The Perst transaction mechanism preserves database consistency (including the consistency of the Lucene full text search index) in case of application or system failure. When the Lucene index is stored in the file system, abnormal program termination can cause corruption of the full text search index.
However, a file system is more efficient for random access to large blocks of data compared to Perst Blobs - the type of access that is actually used by Lucene. Consequently, performance of the file system -based Lucene search engine can be somewhat better than one using Perst storage. Following are results of a comparison of native file system (WinXP/NTFS) and the Perst implementation of Lucene storage. This test indexes 10,000 documents containing 10 fields with 10 randomly generated words:

OperationLucene+file systemLucene+Perst
Build index56 sec84 sec
Index search114 sec118 sec

To create Perst storage of a Lucene index you should use the PerstDirectory class which is located in the perst/lucene directory:

        Storage db = StorageFactory.getInstance().createStorage();
        db.open("file.dbs", pagePoolSize);
        Directory directory = new PerstDirectory(db);
Please refer to the Lucene programming API documentation for how to use this directory (it can be used like any other Lucene directory implementation, for example the standard FSDirectory directory. For instance, to build or update the index you should create an index writer:
        IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), create);
Please look at the PerstIndexer.java and PerstSearcher.java examples in the same directory. These examples can work with an index stored in a Perst database, as well as in a standard file system (if -fs option is specified).

To store references to Perst persistent objects inside a Lucene database, it is possible to add a special field to the document containing the document's OID (object identifier). The OID of the persistent object can be obtained using the Persistent.getOid() method and vice versa - the persistent object can be accessed by its OID using the Storage.getObjectByOID() method.

But the most convenient way of integrating full text search inside Perst is to use the "Continuous" package. This package provides a transparent RDBMS-like interface, version control, optimistic locking and built-in full text search capabilities (based on the Lucene search engine). For more information please read the Continuous section in the Perst manual.

Continuous

Perst is an embedded database system which is intended to be used for the storage of personal application data. All the Perst algorithms were designed to provide the best performance in the case when a single application is accessing a database. But, nowadays standalone applications are being replaced more and more by services - Web based server applications, for example. In such cases, providing concurrent access and high scalability will be the most important priorities.

Another important aspect missed in Perst is full text search. Actually in real life, strict search is rarely needed - it is hard to assume that a user knows the precise name of a book or author. Certainly, the implementation of your own full text search engine is a very complex task, may be even more complex than the implementation of Perst itself. Fortunately there is a good, powerful, fast, well-known and free full text search engine for Java and .Net - Lucene. It would be nice if Perst and Lucene could work together.

Versioning is yet another useful feature required by many applications. Most informational systems dealing with important data (such as finance information) have to store all versions of the documents to be able to retrieve the state of the system at a particular moment of time. Also, such popular services such as a wiki are based on versioning. Perst provides some primitives for version control but they were not integrated in the database. It is the responsibility of the programmer to maintain version histories and locate particular versions.

From my experience with Perst, I found out that the object-oriented model based on transparent persistence used in Perst is hard to understand for many people not familiar with object-oriented databases. That is why people prefer to use the org.garret.perst.Database wrapper class, which provides some kind of abstraction of the traditional interface to a relational database system. Unfortunately, it is not possible to hide all the details from a programmer and some operations like excluding/including objects in indices should still be explicitly performed by the programmer.

The main idea behind the development of the Continuous package was to provide an extension of Perst which could address all the issues discussed below:

  1. Allows the execution of concurrent user transactions more efficiently
  2. .
  3. Integrates full text search capabilities.
  4. Implements a version control system which should be as transparent as possible for a programmer.
  5. Provides a simple and convenient interface which can be used easily by people who mostly have experience working with relational databases systems.

I hope that the approach provided by the Continuous package can actually be a good compromise in reaching all these goals. What makes me think so is that the ideas used to solve one of these issues, also help in other tasks. For example, multi-versioning is not useful by itself, but allows you to achieve isolation of client transactions without setting a large number of locks (which would result in a reduced level of concurrency and increased probability of a deadlock). Each client transaction works with its own snapshot of the database as seen at the moment the transaction begins, in isolation from other transactions. Only during the transaction commit, a check is performed to verify that changes done by the transaction do not conflict with changes committed by other transactions. So, it is similar to optimistic locking. A simplified table-like data structure, similar to the one used in relational databases, allows you to hide manipulations with indices and a full text search engine from the programmer.

The Lucene full text search engine can be used with Perst in standalone mode - when the Lucene indices are stored in a standard OS file system. This provides the highest level of performance, but as far as neither Lucene, nor the file system guarantee the durability and consistency of the file data in case of a power or system failure, such an incident can cause corruption of the full text search index and necessitate its rebuild (which can take a significant amount of time). Continuous provides its own implementation of the abstract Lucene directory, allowing the storage of Lucene data directly in a Perst database. As far as Perst BLOBs are optimized for sequential access to the BLOB data, they are not able to efficiently perform random access operations used by Lucene and so performance in this case will be worse than in the case of using the standard Lucene FSDirectory (which stores data in OS files). But, the main bonus of the Perst Lucene directory provider is that the Lucene indices becomes transactional - they should survive application and system faults and any inconsistency between normal and full text search data is not possible at all.

To create a full text search index for Perst objects using the Continuous package you only needed to mark particular class fields with the FullTextSearchable annotation attribute:

class Address extends IValue 
{ 
    @FullTextSearchable
    private String country;

    @FullTextSearchable
    private String city;

    @FullTextSearchable
    private String street;

    @Indexable
    private int zipCode;
    ...
}

class Company extends CVersion
{
    @FullTextSearchable
    @Indexable(unique=true)
    private String name;

    @FullTextSearchable
    private Address location;    
    ...
}
When objects of this class are stored in the database (the transaction is committed), the Continuous package will automatically extract the value of fields marked with @FullTextSearchable, parse out the individual words, and build a document with this text and associate the document with this persistent object. You can search objects in the same way as with the standard Lucene API, and the search result will contain references to Perst documents.

The @Indexable indicates the field is a standard Perst b-tree index (optionally unique).

Pre-requirements of the Continuous package:

It is possible to build the package using either the compile.bat script in the perst/continuous/src directory or using build.xml in perst/continuous and the ant tool. The directory perst/continuous/tst contains two examples using the Continuous package. The bank example can also be used as a test for measuring the performance and scalability of Perst with the Continuous package.

Javadoc documentation of the Continuous package can be found here.

Tuning

This section contains several hints on how to adjust Perst parameters and increase database performance.

The speed of accessing data from a disk is several times slower than the speed of accessing data in main memory. That is why caching of data is the key to increase database performance. Perst uses a pool of pages to optimize access to the disk. The page pool size can be specified in the Storage.open method (by default it is 4 Mb). Usually, increasing the page pool size leads to better performance. But you will notice the following things:

If you think that all your data should fit in the main memory, you can use the Storage.INFINITE_PAGE_POOL constant in the Storage.open method. In this case, the page pool will be dynamically extended when a new page is requested. As a result, all pages will be cached and present in memory. So, each page is read from the disk only once. Also, in this case a strong object cache is used instead of a weak object cache. It means that all the fetched objects are also pinned in memory and an object is unpacked only once. It is important to notice that the amount of used main memory will be greater than the database size: all objects will be present in memory in packed (inside the page containing the object) and in unpacked (referenced from the object cache) form.

In some applications (such as for mobile devices) persistence is not needed. But, such Perst container classes as Link, Index, FieldIndex, SpatialIndex, can still be used by the applications. In this case, you can use the NullFile implementation of the IFile interface together with Storage.INFINITE_PAGE_POOL to create a transient, in-memory database. Data in this case will never be written on the disk.

There are some constants defined in the StorageImpl class which have influence on the initial and maximal database size. If you want to change them, you will have to rebuild Perst. Below is a detailed description of these parameters:

ParameterDefault valueDescription
dbDefaultInitIndexSize1024 Initial object index size. The object index is increased on demand. Reallocation of the index is an expensive operation and so, to minimize the number of such reallocations, the object index size is always doubled. Specifying a larger initial index size allows a reduction in the number of future reallocations and so, increases performance a little bit. But it also leads to a larger initial size of the database file. With the default value of this parameter, the initial database size is about 50 kb.
dbDefaultExtensionQuantum4Mb Database extension quantum. Memory is allocated by scanning a bitmap. If there is no hole large enough, then the database is extended by this value. Increasing the value of this parameters leads to less frequent rescanning of the allocation bitmap from the very beginning. It leads to faster allocation speeds and better locality of reference for the created objects (because there is a greater chance that they will be allocated sequentially). On the other hand, it leads to less efficient memory usage. Reducing the value of this parameter forces the reuse of existing holes in the memory allocation bitmap.
dbObjectCacheInitSize1319 Size of object cache. Perst needs this cache to check if an object with a given OID is already present in memory. This cache uses weak references to allow the garbage collector to do its work. When some threshold of filling the cache is reached, the cache is reallocated by doubling its size. Once again, increasing this parameter can save some number of cache reallocations.

Now, some hints on how to increase Perst performance and reduce the size of used main memory:
If your database performs a lot of updates of persistent data, then the main limiting factor is the speed of writing changes to the disk; especially a synchronous write to the disk performed by commit. If you will do a commit after each update, then the average speed will be about 10 updates per second (this estimation is based on the assumption than average disk access time is about 10 msec and each transaction commit usually requires writing about 10 pages at random places in the file). But, it is possible to dramatically increase update performance if you group several updates in one transaction. Perst creates a shadow of an object when it is updated the first time inside a transaction. If an object will be updated once in n transactions, then n shadows will be created. If an object will be updated n times inside one transaction, then a shadow will be created only once. This explains the advantage of having one large transaction.

But, if you will perform an update of a large number of objects in one transaction and for each updated object a shadow is created, then it leads to a significant increase in the database file size. If you update each object in the database inside one transaction, the database size will be almost doubled! However, if you perform each update in a separate transaction, then the size of the database will be almost unchanged (because the space of allocated shadow objects will be reused in this case). So the best approach is to perform a commit after 100 or 1000 updates; it will reduce the overhead of each commit and minimize the database size.

If your persistent object forms a tree or graph where all objects can be accessed by reference from the root object, then once you load the root object in the main memory and store a reference to it in some variable, GC will never be able to collect any instance of a loaded persistent object (because it will be accessible from the root object). So, when you application accesses more and more objects from the storage, at some moment of time all of them will have to be loaded in the main memory. This can cause space exhaustion. To prevent this situation you should avoid storing references to container objects which contain references to a large number of members in variables. This is especially true when storing the root object. In this case, GC is able to do its work and throw out from the memory, objects which are not used at this moment (to which there are no references from the local variable). But, it is important to note that objects accessible through the index by a key can be normally deallocated by the garbage collector. So, special care is not needed in this case.

Tricks and Tips

When to use what collection structure
Link
To be used for relatively small collections (objects < 100).
Relation
Is essentially a Link with the addition of being a 1-n relation. The Projection class can be used for 'query like' purposes.
FieldIndex
To be used for large collections (objects > 100). Indexed on a known attribute or a number of attributes (>1 attributes constitutes a 'composite key'). FieldIndex is implemented using a B+Tree. The B-Tree page size is 4 kb, so the minimal size occupied by the index is also 4 kb. Thus, care should be taken as to when and where it should be used.
Index
To be used for large collections (objects > 100). Indexation is done whilst adding the objects to the collection. Index is implemented using a B+Tree.
MultidimensionalIndex
Is useful for Query-By-Examples search including range queries or when search criteria includes various restrictions for different fields. Index is implemented using a KD-Tree.
IPersistentMap
To be used for small or large maps. For small maps, a sorted array is used. For large maps a B-Tree is used. It implements the java.util.Map interface.
BitIndex
Used to locate objects from a set of its boolean properties.
IPersistentList
Provides efficient random access to a very large sequence of objects by using the integer index (position).
IPersistentSet
Very convenient for storing objects in a set (there can be only one instance of an object in the set). IPersistentSet is implemented using a B+Tree.
SortedCollection
Is the best for in-memory databases with complex user-defined keys. It is implemented using a T-Tree (a structure optimized for in-memory databases) and does not store values of keys inside the T-Tree pages, using the provided comparator methods instead.
SpatialIndex
Quickly access two objects with two integer coordinates (such as spatial data). SpatialCollection is implemented using Guttman's R-Tree with the quadratic split algorithm.
SpatialIndexR2
Quickly access two objects with two real coordinates (such as spatial data). SpatialCollectionR2 is implemented using Guttman's R-Tree with the quadratic split algorithm. Unlike SpatialIndex it doesn't pin all its pages in memory, and so, is able to handle larger collections.

When to use values
When a class is to be treated as a storable object *within* another class, one can make it implement the IValue interface. However, care should be taken to never allow the IValue attribute to be null. Example: TimeSeriesTick in the TimeSeriesTest.

Values are always stored in context of the persistent object referencing them. Only those fields of the value objects, which are known at runtime, are stored. If you assigned an object derived from the declared class of the field to the value field, then, the derived fields will not be saved.

Can the Key class be stored in the database?
A Key class is not persistence capable and thus cannot be stored. If one wants to have it readily available one can make it transient and instantiate it through the onLoad() method or via the default constructor of the class.

How to define constructors for persistent objects?
The normal default constructor of a class is always used by Perst when loading objects. This implies that, when one needs to create attributes that are to remain, one has to either:
  • Check the OID to determine if the object exists (if OID exists, then, the object already exists in the database).
  • Leave the default constructor as it is and introduce an alternate constructor that is only called on new object creation.

Initialization of transient attributes or resettable attributes should be done via the default constructor, or the onLoad() method that is called when an object is retrieved from the db.

When is the redefinition of recursiveLoading method needed?
By default, Perst recursively loads all referenced objects. Hence, once you access some persistent object, the complete closure of persistent objects, directly or indirectly reachable from this object by reference, is loaded. So, in theory, loading of the root object can cause loading of all the objects in the database. If this is not desired (because loading of all objects will take a significant amount of time or because it will cause memory exhaustion), then you should stop recursion by redefining the recursiveLoading method in some classes. You should explicitly load all objects referenced from the object with disabled recursive loading.

But really, the latter situation is better. It is important to notice that all the Perst collection classes always load their members on demand. It means that once you have a reference, for example, to FieldIndex, only the header of this collection object will be loaded and the members will be loaded on demand. This is true for all other Perst collections: Link, Relation, Index, SortedCollection, IPersistentSet. Since persistent objects in a database are usually accessible through collections, in most cases explicit loading of the persistent objects is not needed. But be careful: if in addition to placing all your objects in some index, you also link them, for example, in an L2-List, then fetching a single object will cause loading of all other objects from this collection!

When to commit a transaction?
The commit operation requires synchronous write to the disk. It is a very slow operation (modern disks are not able to perform more than 100 such operations per second). So, to improve performance it is better not to perform commit so frequently. But certainly you should realize than once some problem arrives (the application or system crashes or just the power fails), then all the uncommitted data will be lost.

How to deallocate objects
Perst doesn't automatically exclude deleted objects from any indices containing them. So, it is the responsibility of a programmer to remove an object from all indices before it is deallocated.
Perst also doesn't remove any object referenced from the removed object. If this is needed, the programmer should do it himself.
Explicit deletion of objects can cause two problems:
  • Dangling references (references to the removed objects).
  • Garbage in the database (unreferenced objects).
The first problem is the most critical and can cause the corruption of database data. To prevent this problem it is strongly recommended to use the Perst garbage collector instead of using explicit memory deallocation. If this is not possible (due to performance or some other reasons), it can still be used for debugging. Since Perst GC is able to detect both problems: it will cause a StorageError(StorageError.INVALID_OID) exception if a reference to the deleted object is found and return a non zero number of collected objects if there is garbage in the database.

Why does an application working normally in the single-threaded mode  get assertion failures or some other exceptions if the database is updated by more than one thread?
Perst doesn't synchronize itself with the access of an application to the persistent objects. It is the responsibility of the application to set proper locks to avoid concurrent access to the same object. Just using beginThreadTransaction and endThreadTransaction is not enough for proper concurrent work with persistent objects. The following alternatives are possible:
  1. Access a database from only one thread.
  2. Access a database from multiple threads but use a global lock to provide mutual exclusion of each thread. So, thread-lock a mutex, then perform some operations with the database, commit transaction and unlock mutex.
  3. Use per-thread transactions in the EXCLUSIVE_TRANSACTION mode. This approach is similar to 2), but Perst will provide exclusion of threads itself.
  4. Use per-thread transactions in the SERIALIZABLE_TRANSACTION mode + object level locking + alternative B-Tree implementation.
Please notice that in alternatives 1-3 only one thread is accessing the database at each moment of time; so, it is not correct to say it is concurrent execution. But it doesn't mean that with approach 4 you will get the best performance. This is because of the locking overhead and alternative B-Tree implementation which is less efficient than the original implementation for very large databases. Also, approach 4 is the most difficult to program because setting proper locks is the responsibility of the programmer and incorrect locking can cause synchronization problems: race conditions or deadlocks (two or more threads mutually block each other).

Please look at the tst/TestServer.java example which illustrates how to use the per-thread transactions and locking when a database is concurrently accessed from multiple threads.

Why is there a significant slowdown in the speed of inserting new records in a database when the size of the database is increased? How is it possible to improve insertion performance?
The larger a database is, the higher is the probability of a page cache miss. In other words, when a database is small, all pages are cached in the page pool and no disk accesses are needed at all. Once the database becomes larger than the page pool size, some of the pages have to be thrown away from the cache (and also loaded on demand). When the database size is increased, the probability of locating a requested page in the page pool becomes smaller and hence, the number of disk reads as well as the time taken for database operations are increased.

If you insert keys in the index in any random order, it is most likely that loaded pages will be located at random offsets within the database file. Thus, access to the pages requires positioning of the disk head. The average disk positioning time for modern disks is about 10 msec. It means that a disk is able to perform about 100 disk accesses per second. If a database is so large that each operation with a B-Tree requires loading of all the pages in the path from the root to a leaf page (for a large number of objects the path will be at least 3 pages), then, the database will be able to perform about 30 inserts (or searches) per second. Inserting 1 million objects in this worst case scenario will take about 10 hours! Fortunately, real performance is significantly better (because of page cache hits, file system cache...).

There are two obvious ways of improving performance:

  1. Increase the page pool size (but it should not be larger than the available physical memory, otherwise you will get swapping and performance degradation).
  2. Insert objects into the index in sorted order. In this case the same B-Tree pages will be accessed for inserting subsequent objects and the number of disk accesses will be decreased dramatically. If you are importing data from some text file, I suggest you sort it first by the index key using the "sort" utility. If this is not possible, objects can be loaded in the memory, stored in some array and then this array can be sorted using the standard JDK Arrays.sort method. If the number of loaded objects is too large to fit in the memory, I recommend you load some number of objects which fit in the memory, sort them and then insert them in the index. Then, load the next portion of objects, sort them, insert them in the index and so on.

What is the most efficient way of storing multimedia data?
Usually databases with multimedia data (images, video, texts . . . ) are very large and most of the space is taken by the BLOB (binary large object) storing this multimedia data. To be able to efficiently perform a search in such a database it is necessary to separate multimedia data and the metadata describing it (name, description, keywords, categories . . .). The size of the description data is much smaller than the size of the multimedia data itself, so it can completely fit in the memory (in the page pool) and be searched very fast.

Perst provides four approaches which should be used together to provide the best performance for multimedia applications:

  1. Blob class: allows efficient storage of large binary objects.
  2. Custom allocator: allows the allocation of instances of specified classes in a special way (in separate space).
  3. Multifile: a virtual file consisting of several physical files (segments). Each segment can be placed on a separate disk, making it possible to create very large databases and what is more important in our case: place BLOB objects in separate files.
  4. LRU page pool limit: prevents the caching of BLOB pages. By default Perst uses the LRU algorithm for finding a candidate for replacement. But for BLOBs this strategy is not optimal and fetching a BLOB can cause the whole page pool to be flushed if LRU discipline is used. There is a high probability that the fetched BLOB pages will not be used any more. Hence, it is preferable not to cache BLOB pages at all (throw away such a page immediately when it is not used any more). Prohibiting the caching of pages in the file having offsets larger than some threshold value in conjunction with using a custom allocator for the BLOBs makes it possible to switch off caching for BLOB pages.

Now, we shall see how to use all these four approaches. BLOBs can be created using the Storage.createBlob method. BLOBs allow incremental writing and retrieving of BLOB content. To create a multifile, you should create a multifile description file describing its segments. Each line of this file should contain the path to the physical file and the size of this segment (in kilobytes). Size of the segment should be aligned on the database page boundary (it should contain an integer number of pages). For the last segment, the size can be skipped. The file can look something like this (tst/testalloc.mfd):

    "testalloc1.dbs" 1125899906842624
    "testalloc2.dbs" 
To open a multifile it is necessary to pass the Storage.open method path to the multifile description file prepended by the '@' character.

Please notice that space in the multifile segments in allocated on demand, so it is possible to specify a very large segment size without the risk of running out of space on the disk. In this way, we can place BLOBs in separate files. For example, in testalloc.mfd the multifile description shown above the size of the first segment, is set to 2^60 bytes. The first segment will be used to store all the database objects except BLOBs. Since 2^60 is very large number, we should not be afraid that at some point in time, the space in the first segment will get exhausted (space on the disk gets exhausted earlier).

The custom allocator is a persistent object implementing the org.garret.perst.CustomAllocator interface. Programmers can provide their own implementation of this interface or create a bitmap allocator using the Storage.createBitmapAllocator method. This method takes four parameters: size of the allocation quantum, base address of the allocator, extension quantum, and the limit of allocated space. By setting the allocator's base address equal to the base address of the multifile segment ((1L << 60) in case of above multifile definition). Then, it is necessary to register the allocator. This should be done using the Storage.registerCustomAllocator method, associating this allocator with the particular class. Space in the storage, for all instances belonging to this class or derived from it, will be allocated using this allocator. If we want to separate BLOBs from other data we should bind this allocator with the BLOB class.

And now the last thing: we want to disable caching for BLOB pages. Perst has the "perst.page.pool.lru.limit" property which can be used to disable caching for all pages in the file which have an offset larger than the specified limit. The default value of this parameter is 2^60 (1L << 60). This means that all pages with an offset larger than 2^60 will not be cached in the page pool. Hence, in our example, as far as we allocate BLOBs in the segment starting at address 2^60, the caching of BLOB pages will be disabled automatically (there is no need to explicitly set the "perst.page.pool.lru.limit" property).

The approaches described in this section may seem to be too complicated. But actually, they are not so difficult. Please have a look at the tst/TestAlloc.java example which illustrates how it is possible to store files in a Perst database. You will see that not a very big effort is needed to efficiently store BLOBs.

How to create a read-only compressed database
Perst provides CompressedFile which uses the ZLIB compression library. This file can be used in the read-only mode, so, the database should be prepared previously.

First, create and initialize a database. Then you should use the org.garret.perst.CompressDatabase utility to compress the database file. This is a command line utility. The first mandatory parameter specifies the path to the database file and the second optional parameter specifies the compression level. This utility creates a compressed file in the same directory where the original file is located but with the ".dbz" extension. This utility splits files into 128 kb blocks and compresses each block separately. You can open a compressed file using a standard zip utility. Inside, you will find several entries: 000000000000, 000000131072, 000000262144 . . . . Each entry corresponds to a 128 kb block of the original file. If you extract and concatenate all of them, you will get the original database file. When the compressed file is ready, you can open it using org.garret.perst.CompressedFile.

Please see the TestBlob example. First, you should fill the database, by running TestBlob. Then, you should compress the database using the CompressDatabase utility. After this you can run TestBlob zip which will extract data from the compressed database. In this example, the original database file size is 315392 bytes and the size of the compressed database is 24436 bytes.

How do I protect information stored in the database?
Perst supports database encryption, using a special implementation of the database file which performs encryption/decryption of the data with a provided cipher key. Perst uses the RC4 encryption method. Encryption is performed at the page level: a page is encrypted before it is written to disk and is decrypted when it is loaded into the page pool. The database is opened using a special version of the Storage.open method which takes a cipher key:
    public void open(String filePath, long pagePoolSize, String cipherKey);
Please note that if in addition to using data encryption, you also want to use database compression, then you should use a specialized implementation of the database file instead of using the Storage.open method with the cipher key parameter. This is described in the following section.

How do I reduce database size?
If your database contains a lot of ASCII strings, then you can reduce by almost half the space used by strings through use of UTF-8 encoding for strings instead of UTF-16 encoding. Encoding can be set with the "perst.string.encoding" property. In UTF-8 encoding, ASCII characters are represented by one byte instead of the two bytes used in UTF-16. Perst.Lite detects ASCII strings and applies the one-byte-per-character format automatically.

If your database contains a large number of small objects, then per-object space overhead can significantly increase database size. Each Perst object has 8-byte object header. Also, Perst uses a bitmap memory allocator for allocating database space. Perst's allocation quantum is specified in StorageImpl.dbAllocationQuantumBits and is 32 bytes by default. Therefore, if your object contains only one field of type int (4 bytes), then the size of the object, with object header, is 4+8=12 bytes. But the bitmap allocator allocates 32 bytes for this object on disk. So instead of just 4 bytes, Perst will use 32 bytes in the storage.

The only way to eliminate this object header overhead is to use the TimeSeries class, so that small fixed-size objects will be allocated in blocks containing large numbers of elements. But a drawback to this method is that it is impossible to access time series elements directly - you can only locate an element or range of elements by timestamp (any monotonically increasing key).

In addition, to reduce internal fragmentation caused by the memory allocator quantum, you can try to reduce quantum value (but as a result, the size of the memory allocation bitmap will increase, and the allocator speed will decrease).

The most radical way to reduce database size is to use database compression. Perst supports two approaches to compression: read-only compression performed by a special utility, and online compression supported in the JDK 1.5 version of Perst, and in Perst.Lite with the RMS storage layer. Compression is performed at the page level. Database pages are compressed when they are written to disk, and decompressed when they are loaded from disk into the page pool. Compression will cause additional CPU load, but database size can be significantly decreased (the actual compression ratio depends on the stored data -- the typical reduction is about 3 times). Despite the increased CPU demands, compression can actually improve performance, because a larger portion of the database may be cached by the OS file system cache. To perform online database compression in the JDK 1.5 version of Perst, use a special implementation of the database file, CompressedReadWriteFile:

    Storage db = StorageFactory.getInstance().createStorage();
    db.open(new CompressedReadWriteFile("mydatabase.dbs", CIPHER_KEY), PAGE_POOL_SIZE);        
In Perst.Lite, use the perst-rms.jar library and CompressedFile class:
    db = StorageFactory.getInstance().createStorage();
    db.open(new CompressedFile("mydatabase.dbz", new JZlibEncryptedCompressor(cipherKey)), PAGE_POOL_SIZE);
Please note that in the examples above, a cipher key is also specified. Although database compression and database encryption are independent of one another, and Perst is able to provide database encryption without database compression, compression should take place before encryption. Otherwise, compression will have almost no effect on database size, since encryption eliminates regularities in the data stream. This is why compressed database files also provides data encryption (if you do not want to use encryption, just pass null as the cipher key or use JZlibCompressor instead of JZlibEncryptedCompressor).

What are the reasons for the OutOfMemoryException exception in Perst applications?
There are six major reasons for the memory overflow condition in Perst:

  1. The finalization thread is not able to perform the finalization of the object in time. Actually it is the Java runtime’s responsibility to avoid the OutOfMemoryException exception in such cases by suspending user threads and making it possible for finalization and garbage collection threads to complete their work and free enough memory. But with some JVM implementations, in very rare situations under heavy load, such a situation is unfortunately possible. This can only happen in the standard version of Perst, not in Perst.Lite, since finalization is not supported in J2ME (Java ME).

    Recommendation: increase the memory limit for the application (-Xmx option in Sun JVM).

  2. The threshold for the number of objects pinned in the LRU object cache is large, but it can be exceeded, causing memory overflow, especially if the objects’ sizes are large. By default, Perst uses a LRU object cache based on weak references. Some quantity of most frequently used objects is pinned in memory, while other objects are referenced using weak references. Weak references allow the garbage collector to deallocate an object if there are no more strong references to the object. The maximum number of pinned objects can be set using the "perst.object.cache.init.size" property ("perst.object.cache.pin.count" in Perst.Lite). The default value is 1319 for the standard Perst version and 100 for Perst.Lite.

    Recommendation: decrease pinned object limit.

  3. The application has specified a large page pool size. As the database grows, more pages are cached in the pool, which eventually leads to the memory overflow. The page pool size is specified in the Storage.open method. The default value is 4Mb for standard Perst and 64kb for Perst.Lite.

    Recommendation: decrease size of page pool. Please also read the recommendations concerning the choice of proper page pool size in the Tuning section.

  4. The application is modifying (updating or inserting) a very large number of objects in a single transaction. Since finalization is not supported in J2ME, Perst.Lite has to pin all modified objects in memory. In the standard Perst, modified objects are pinned in memory when using serializable transactions.

    Recommendation: split large transactions into several smaller ones.

  5. Your persistent objects include direct references, which makes it impossible for the GC to deallocate them. For example, if you are including an object into the single-linked list, your class would have the "next" field:
    class MyClass {
         MyClass next;
         ...
    }
    
    When such an object is loaded in memory, it contains a reference to the next object which will be recursively proceeded and also loaded into memory. And the next one, and the next one... As the result, the entire chain of objects is loaded into memory. If a single reference to the first item in the list is kept in some program variable, none of the objects can be deallocated by the GC, since all of them are "strongly" referenced.

    You can avoid loading all objects in the linked list by explicitly cutting off recursion. Override the Persistent.recursiveLoading method in MyClass and handle loading of instances of this class explicitly using the Persistent.load method. However, if the reference to the first item in the list is pinned, this approach will not help solve the problem of all loaded objects being pinned in memory. Generally speaking, the solution is to use direct references only to implement object associations in small groups, and not use references with large data collections. In the latter case, you should use the Perst collection classes which load collection members on demand and do not keep strong references to objects. That allows the GC to deallocate them.

    Recommendation: use Perst collection classes instead of self-made collections.

  6. Your application keeps references to the inserted or loaded persistent objects in program variables. That also prevents the GC from deallocating these objects. This case is similar to Reason 5, above. The difference is that in this case, the references are kept in your variables, including in the fields of some other transient objects (Reason 5 describes references being kept as parts of persistent objects). Also note that this scenario can't happen without 5 happening as well: if all persistent objects are referencing each other, but your application does not keep a reference to any of them, then the GC would be able to clean them up.

    Recommendation: avoid references to persistent objects from program variables.