The main advantage of Perst is its tight integration with programming languages. There is no gap between the database and the application data models - Perst directly stores language objects. So there is no need to pack or unpack code, an operation which has to be performed for traditional relational databases. Also Perst (unlike many other OODBMS) requires no special compiler or preprocessor. Yet, it is able to provide a high level of transparency.
Persistent base class is considered persistence-capable. It is automatically made
persistent and stored in the storage when it is referenced from some other
persistent object and when the
store method of that object is invoked.
So there is no need (but it is possible) to explicitly assign objects to the
storage.
The storage has one special root object. The root
object is the only persistent object accessed in a special way (using Storage.getRoot method). All other persistent objects are accessed in the normal way:
either by traversing by reference from another persistent object, or by using
object containers (Index,
Link or Relation). Unlike many other OODBMS, there can be only one root in the
storage. If you need to have several named roots, you should create an Index object
and use it as the root object.
Before version 4.0, Perst required that each persistence-capable class be
derived from the Persistent class. This made it impossible to store
‘foreign’ classes in the storage, and was the ‘price to pay’ for Perst’s
ease of use and the absence of any specialized preprocessors or compilers.
Starting from version 4.0, Perst allows ANY class, even those not
implemented using the IPersistent interface, to be stored in the database.
It is still possible to derive persistent classes from Persistent, and such
classes will even be handled more efficiently than "foreign" classes not
derived from Persistent. New rules surrounding persistence-capable objects
are as follows:
IValue interface) are not stored as
separate objects but, instead, are embedded inside persistent objects
referencing them. This is also true for strings and wrappers of all
primitive data types.
java.util.ArrayList class, which are used to implement 1-N relations in
applications. This is why standard collection classes (java.util.* package)
are serialized by Perst in a special way: as embedded collections. There is
no separate persistent object in the storage representing such collections,
but they are stored inside the persistent objects referencing them. Map
classes are currently not supported (since the standard Java implementation
of map is very inefficient for disk databases). It is highly recommended to
use specialized Perst collections for this purpose.
IPersistent interface are able to control recursion.
Classes not implementing the IPersistent interface are recursively
stored on infinite depth (which can cause stack overflow in case large
linked object clusters).
Persistent have no proper finalizer,
Perst has to pin all such modified objects in memory. This limit the maximum
size of transactions (but it is possible to explicitly save an object in the
storage, in which case it can be thrown away from memory).
Components of persistence-capable objects are restricted to the following types:
boolean, int, short, double, ...
java.lang.String type
java.util.Date type
Persistent or any interface extending the IPersistent interface.
IValue interface with the same restrictions on the types of components as
for persistence-capable objects. If perst.implicit.values property is true, then any class which is not derived from IPersistent will be treated as a value. Values are stored inside the persistent object
containing them. The value class should have a default constructor (a constructor
with an empty parameter list) or no constructors at all. The declared type of
the field referencing value should be the same as the actual type of referenced
object (so it is not possible to assign an object of the derived class to this
field). Also, the value class should not have recursive references (direct or
indirect) – e.g. field of value class V cannot have type V.
Storage.createLink method.
java.lang.util package
are stored as embedded collections in the persistent object, referencing them.
So such collections and maps are not independent persistent objects themselves, them are not assigned OID
and can not be referenced from more than one persistent object.
Collection and maps classes defined by applications are
treated as any other persistent objects unless you set "perst.serialize.system.collections" property to false.
Unfortunately, it is not possible to detect if an object is
changed or not without saving the old state of the object and performing a
field-by-field comparison with the new state of the object. But, the overhead
of such a solution (both in terms of space and CPU resources) is very high. In
Perst it is the responsibility of the programmer to save objects in the
storage. This can be done by the Persistent.store or Persistent.modify methods.
The Persistent.store method writes an object, as well as all objects which are not yet
persistent but referenced from this object, to the storage. So, if you create a
tree of objects and assign a reference to the root of this tree to some
persistent object X, it is only necessary to invoke the store() method in this object X. But, if you update one of the elements in this
tree, you should invoke store() individually for each such element (X.store() will NOT
automatically save referenced objects now).
The Persistent.modify method marks an object as modified but doesn't immediately write it
to the storage. All objects marked as modified will be stored in the storage
during transaction commit (store method will be invoked for each modified object). So, using the modify method is preferable if an object is updated multiple times within a
transaction. In this case, instead of storing it several times, it will be
stored only once - at the end.
Perst doesn't support nested non-static classes. The reason is that the non-static class contains a final field pointing to the outer class. As far as a field is final, it is not possible to assign a value to it during object loading in the same way as for all other fields of a persistent object.
The IPersistent interface declares the recursiveLoading method. The default implementation of this method in the Persistent class always returns true. In this case, Perst will recursively load any object referenced by
a target object when the target object is loaded. Thus, it causes implicit
loading of the cluster of all referenced objects from storage to the main
memory. This is similar to how a serialization mechanism works.
To avoid an overflow of the main memory caused by recursive
loading of all objects from the storage, a programmer has to overload recursiveLoading method in some classes and return false in it. Objects
referenced by such an object will not be implicitly loaded and the programmer
has to explicitly invoke Persistent.load method to load them. So, the recursiveLoading method
can be used to control loading of objects from storage to main memory.
Also, it is important to notice that containers (Link, Relation, Index) always load member objects on demand (they do not perform
automatic loading of all objects in the containers). Since access to the
container members is always performed through methods of the container class,
the programmer will never notice it - container methods will always return
loaded object(s).
Perst uses a default constructor (constructor without parameters) to create an object loaded from storage. This means that:
sun.reflect.ReflectionFactory. But this will work only with Sun's JDK.
onLoad method which is
invoked after fetching all values of the non-transient fields of the persistent
object from the storage.
So, summarizing all of the above, the proposed mechanism is
convenient and easy to use because it doesn't require the programmer to
explicitly load any referenced object. From another point of view, it is
flexible by providing the programmer control on object loading. Only those
classes (usually containers) which explicitly control loading of their members
(by overloading recursiveLoading to return a false value) should be aware about calling the Persistent.load method.
int type to long or
to float. But changing the type from int to
short is not possible. More precisely, Perst is able to perform any
conversion which is supported by Java reflection mechanism (field type XXX can
be changed to YYY if the java.lang.reflect.Field.setXXX method can be applied to the component with type YYY).
All other schema modifications (such as renaming fields,
incompatible change of field type) cannot be handled by the Perst automatic
schema modification mechanism. In this case, you can use the Perst XML
export/import mechanism. A database can be exported to XML format using the Storage.exportXML method and then recreated with new class definitions. After it is
saved, data can be imported using the
Storage.importXML method.
Link interface to deal with relations of this kind. This interface allows you to
add/delete/inspect members of the relation. Members of the relation can be accessed
by index or be extracted as an array of objects.
Relations can be of two kinds: embedded (where
references to the related objects are stored in relation to the owner object
itself) and standalone (where the relation is a separate object, which
contains the references to the relation owner and the relation members). Both
kinds of relations implement the Link interface. An embedded relation is created by the Storage.createLink method and a standalone relation is represented by the persistent class created by the Storage.createRelation method.
So, a relation between two classes A and B can be implemented in the following way:
| Relation type | Object A | Object B |
|---|---|---|
| one-to-one | B bref; | A aref; |
| one-to-many | Link bref; | A aref; |
| many-to-one | B bref; | Link aref; |
| many-to-many | Link bref; | Link aref; |
Hashtable or the HashMap class can be used for this purpose. In databases, usually a more
sophisticated search is required. I do not want to implement the complete SQL
language in Perst, because it immediately makes the DBMS huge and slow. But, in
most cases, an application performs only very simple queries using an exact
match or a key range. This is done in Perst by Index and IndexField interfaces. The first interface is used for an independent specification key
and its associated value. IndexField interface allows you to index objects using one of the fields of
this object (key field).
Indices are created in Perst using the Storage.createIndex or the Storage.createFieldIndex methods. There can be several index implementations but, right now,
only one implementation based on B+Tree is provided (because B+Tree is the most
efficient structure for disk-based databases). Methods of the Index and the FieldIndex interfaces allow you to add, remove and locate objects by the key.
It is possible to perform a search either by specifying the exact key value or
by specifying a range of key values (the high or low boundary or both of them
can be skipped or can be declared as being exclusive or inclusive). So it is
possible to perform the following types of search:
There are several different ways of selecting objects using index:
IPersistent
get(Key key)
Object[]
get(Key from, Key till)
null. Both
boundaries can be inclusive or exclusive.
Iterator
iterator()
Iterator
iterator(Key from, Key till, int order)
null. Both
boundaries can be inclusive or exclusive. Objects can be traversed in the
ascending or descending order of their keys.
If you need a set of persistent objects you should use the Storage.createSet method. Set is implemented using B+Tree where the object id (OID)
is used as a key.
Perst also supports spatial indices (org.garret.perst.SpatialIndex) and generic indices with a user-defined comparator (org.garret.perst.SortedCollection). The spatial index is implemented using Guttman's R-Tree with
quadratic split algorithm. It allows for efficient search of R2 objects. Sorted
Collection provides almost the same methods as FieldIndex but it uses
a user-defined comparator to compare collection members. A sorted collection is
implemented using a T-Tree and is especially efficient for main-memory
databases.
| Interface | Description | Key type | Implementation | Created by |
|---|---|---|---|---|
Index |
Index with an explicitly specified key used for exact match or range queries | scalar, string or reference | B+Tree | Storage.createIndex(Class
type, boolean unique) |
Index |
The same as above but assuming that there can be a lot of duplicate key values | scalar, string or reference | B+Tree | Storage.createThinkIndex(Class
type) |
Index |
Random access index optimized for accessing elements both by key and by position | scalar, string or reference | B+Tree | Storage.createRandomAccessIndex(Class
type, boolean unique) |
FieldIndex |
Index constructed for one of the object fields | scalar, string or reference | B+Tree | Storage.createFieldIndex(Class
type, String fieldName, boolean unique) |
FieldIndex |
Random access field index optimized for accessing elements both by key and by position | scalar, string or reference | B+Tree | Storage.createRandomAccessFieldIndex(Class
type, String fieldName, boolean unique) |
MultidimensionalIndex |
Multidimensional index allows to select objects using search conditions for various fields. This index provides better performance than traditional indices if there is no single field in the query with good selectivity. In case of traditional indices (like B-Tree) it is necessary to join large results sets or perform sequential search among large number of objects. Compound index allows to select object only if all keys are specified (or at least prefix keys). And multidimensional index allows to perform search simultaneously taking in account all specified restrictions. But as far as it is implemented using binary tree which nodes contains references to the indexed objects, this index is efficient mostly for in-memory databases or in cases when all traversed objects fits in memory. | scalar, string or any other comparable type | KD-Tree | Storage.createMultidimensionalIndex(Class cls, String[] fieldNames)or Storage.createMultidimensionalIndex(MultidimensionalComparator comparator)
|
BitIndex |
Bit index for searching object by bitmap of properties | persistent object | B+Tree | Storage.createBitIndex() |
IPersistentSet |
Set of persistent objects | persistent object | B+Tree | Storage.createSet() |
IPersistentSet |
Scalable set of persistent objects (can efficiently handle both small and large number of members) | persistent object | Link or B+Tree | Storage.createScalableSet() |
IPersistentList |
List of persistent objects providing random access | persistent object | B+Tree | Storage.createList() |
IPersistentList |
Scalable list of persistent objects (can efficiently handle both small and large number of members) | persistent object | Link or B+Tree | Storage.createScalableList() |
IPersistentMap |
Scalable map of persistent objects (can efficiently handle both small and large number of members) | persistent object | Sorted array or B+Tree | Storage.createMap() |
SpatialIndex |
Index for spatial objects with integer coordinates | Rectangle | R-Tree | Storage.createSpatialIndex() |
SpatialIndexR2 |
Index for spatial objects with real coordinates | RectangleR2 | R-Tree | Storage.createSpatialIndexR2() |
SortedCollection |
Index with a user-defined comparator | any | T-Tree | Storage.createSortedCollection(PersistentComparator
comparator, boolean unique) |
FullTextIndex |
Full text index | any text | B-Tree based inverse index | Storage.createFieldIndex(FullTextSearchHelper helper) |
select * from T where x=?;
Perst relations can be used to implement
simple joins, like the following SQL query fetching all orders to a particular
vendor:
select * from O Order, V Vendor where O.vendorID = V.id AND V.name='ABC';
In Perst it is possible to first select a vendor
using an index search and then traverse the corresponding relations to locate
all the orders to this vendor.
But sometimes, it is necessary to implement more complex queries.
This is also possible in Perst, without the need to write queries in some
special (non-procedural) language. The Projection class is
used to combine the results of several simple operations (such as an index
search). Let us start explanation of this class with an example. Consider that
we need to know the details of all the orders with names starting with 'D1'
shipped by vendors having names starting with 'V1' or 'V3'. The SQL statement
which performs this query is as follows:
select * from O Order, V Vendor, D Detail where D.name like 'D1%' and O.detailID = D.id
and O.vendorID = V.id and (V.name like 'V1%' OR V.name like 'V3%');
And now, we shall see how this can be done
in Perst. Consider that we have indices for details and vendors:
FieldIndex detailIndex = db.createFieldIndex(Detail.class, "name", true);
FieldIndex vendorIndex = db.createFieldIndex(Vendor.class, "name", true);
Set of requested orders can be obtained in
the following way:
// Projection from vendors to orders (using "orders" field of Link type)
Projection v2o = new Projection(Vendor.class, "orders");
// Projection from details to orders (using "orders" field of Link type)
Projection d2o = new Projection(Detail.class, "orders");
// Select vendors with name like 'V1%'
v2o.project(vendorIndex.getPrefix("V1"));
// Select vendors with name like 'V3%'
v2o.project(vendorIndex.getPrefix("V3"));
// Select details with name like 'D1%'
d2o.project(detailIndex.getPrefix("D1"));
// Join projections
v2o.join(d2o);
// Get array of requested orders
Order[] orders = (Order[])v2o.toArray(new Order[v2o.size()]);
As you can see, the Projection class is used for several purposes:
It is possible to derive your own class from the Projection class
to implement more sophisticated projections than using a single projection
field.
If you need to perform a selection sort in some particular
order then the most efficient way is to use the FieldIndex method for
it. When you select objects using index, the selected objects are sorted by the
search key. If you need to perform a selection sort by a field which is not the
search key, then you can use the Arrays.sort method. For
example, if in the query described above we need to sort orders by the price
field, it can be done with the following statement:
Array.sort(orders, new Comparator() {
public int compare(Object a, Object b) { return ((Order)a).price - ((Order)b).price; }
});
commit,
rollback or close methods.The committing of a transaction causes a synchronous write of changed pages to the disk. Synchronous writes (where an application is blocked until data is really flushed to the disk) are very expensive operations. The average positioning time for modern disks is about 10 msec, so they are usually not able to perform more than 100 writes in random places in one second. As a transaction usually consists of several updated pages, this leads to an average performance of about 10 transaction commits per second.
Performance can be greatly increased if you minimize the number of commits (larger transactions). Perst uses a shadow mechanism for transaction implementation. When an object is changed for the first time during a transaction, a shadow of the object is created and the original object is kept unchanged. If the object is updated multiple times during the transaction, the shadow is created only once. Because it uses shadows, Perst does not need a transaction log file. Therefore, in Perst, long transactions will not cause a transaction log overflow as in most other DBMSes. Quite the contrary, if you do not call commit at all. Perst works as a DBMS without transaction support, adding almost no overhead for supporting transactions.
The only drawback of long transactions is the possibility of losing a lot of changes in case of a fault. Perst will preserve consistency of the database, but all changes made since the last commit will be lost.
Perst also supports per-thread transactions. Three types of
per-thread transactions are supported: exclusive, cooperative and serializable.
In an exclusive transaction, only one thread can update the database. In
cooperative mode, multiple transactions can work concurrently and the commit() method will be invoked only when the transactions of all the threads are
terminated. Serializable transactions can also work concurrently. But unlike
cooperative transactions, the threads are isolated from each other. Each thread
has its own associated set of modified objects and committing the transaction
will cause saving of only these objects to the database. To synchronize access
to the objects in case of a serializable transaction, a programmer should use
lock methods of the IResource interface. A shared lock should be set before read access to any
object, and an exclusive lock before write access. Locks will be
automatically released when the transaction is committed (so a programmer
should not explicitly invoke the unlock() method). In
this case it is guaranteed that the transactions are serializable.
It is not possible to use the IPersistent.store() method in serializable transactions (you should use IPersistent.modify()
instead). That is why it is also not possible to use the Index and the
FieldIndex containers since they are based on the B-Tree and a B-Tree directly
accesses database pages and uses the store() method to
assign an OID to an inserted object. You should use the SortedCollection based on T-Tree instead or an alternative B-Tree implementation
(see "perst.alternative.btree" property).
Starting from version 2.66, Perst provides multi-client
access to a database. This is implemented using OS file locks. Databases are
accessed in multiple-readers-single-writer mode. In order to enable
multi-client support, it is necessary to set the
"perst.multiclient.support" property. Then, all accesses to a
database should be performed within explicit transactions, i.e. transactions
should be started using the beginThreadTransaction method. If a transaction is
read-only, then the
Storage.READ_ONLY_TRANSACTION mode should be used. Several read-only transactions can be executed
in parallel. But if a transaction may modify the contents of the database, then
the Storage.READ_WRITE_TRANSACTION
mode should be used and the transaction will have exclusive access
to the database.
Transaction access to the database is synchronized for the different
processes in one system as well as for the different threads within one
process. Transactions can be nested - you can invoke beginThreadTransaction as many times as you need, assuming that the same number of endThreadTransaction calls are invoked. So, you can wrap the code of each method
accessing the database with the
beginThreadTransaction and the endThreadTransaction methods. It is not possible to upgrade a transaction from shared (READ_ONLY_TRANSACTION) to exclusive (READ_WRITE_TRANSACTION). Hence, you should always use the READ_WRITE_TRANSACTION if the transaction may modify the database.
The outermost endThreadTransaction will cause the transaction to commit and release the OS file lock. Unlike endThreadTransaction, the rollbackThreadTransaction doesn't consider the counter of nested transactions - it aborts all
nested transactions. If some process accessing the database crashes, the file
lock held by this process is automatically released. According to the
specification of the
java.nio.channels.FileChannel.lock method, some operating systems may not support shared locks. In
this case the only choice is to always use the READ_WRITE_TRANSACTION mode. Because the java.nio package was introduced in JDK 1.4, Perst is not able to provide
multi-client access with earlier JDK releases.
Replication is page based - all dirty pages from the master node page pool are sent to the slave nodes. A slave node detects the moment of transaction commit (when the root page is sent to a slave node and the state index is changed in the database header) and sets an exclusive lock in this case. The read-only transactions at the slave nodes set a shared lock and work with the previous (consistent) state of the database. When a transaction commit is performed by the master node, a slave node uses this lock to transfer the database into the next consistent state. Starting from this moment, the read-only transactions at a slave node will work with the new, consistent state of the database.
Perst supports two replication models: static and dynamic. In the static model, all slave nodes have to be defined when an application is started. The master node tries to connect to all the specified slave nodes. After establishing all possible connections with the active slave nodes, it starts working, replicating the changes to these slave nodes. The content of all statically defined slave nodes is considered to be the same as that of the master node. Thus, when a database is opened, the master expects that all the connected slave nodes have the same state as its own state and will send to them only the pages notified by the application after it is opened. If you are going to add a new static slave node, you should first (before the database is opened) manually copy the database from the master node to this new slave node.
In the dynamic model, it is possible to add new replicated nodes at each moment in time. In this case, Perst transfers the complete database to the new node, synchronizing its state with the master node state. When the new replicated slave node is synchronized with the master node, it will act in the same way as a static, replicated node: the master will send to both of them all the updates performed by the application. The dynamic mode is especially useful for main-memory databases (with an infinite page pool and a NullFile stub). In this case, the size of the database is not expected to be very large and its transfer to a new replicated node should not take a lot of time.
As far as replicated nodes are able to execute read-only
queries, the replication mechanism can also be used to improve performance by
distributing the load between several nodes. To be able to execute read-only
transactions at a slave node, an application should use per-thread transactions
wrapped by the
Storage.beginThreadTransaction(Storage.REPLICATION_SLAVE_TRANSACTION)
and the Storage.endThreadTransaction() method invocations. It is possible to run many such transactions
concurrently. Only when the master node commits a transaction, a slave node has
to set an exclusive lock blocking all read-only transactions (certainly it has
to first wait for the completion of all currently executing read-only
transactions. So, these transactions should be short enough to avoid blocking
the replication mechanism for a long time). This exclusive lock is released
very fast - immediately after placing a new root page in the pool. Now
read-only transactions will see the new state of the database.
When an application running at the master node closes the storage, the master sends a CLOSE request to all the slave nodes causing them to close their connection with the master. In case of abnormal termination of one of the slave nodes, the master continues to work with the other slave nodes. If no more slave nodes are online, the master continues to work autonomously. The Perst replication mechanism doesn't enforce any particular strategy for the recovery of nodes. It is up to the application how to handle the situation of the master crashing and choosing a new master node.
In Perst, there is the org.garret.perst.Query class which allows you to execute JSQL queries. To execute a JSQL query it is
necessary to provide the object iterator class (java.util.Iterator) of iterated
objects and the string representing the query condition. It is also possible to
specify indices, resolvers, query parameters. The result of a query execution
is another iterator which can be used to traverse through the selected objects
(objects matching the specified search criteria).
All the Perst collection classes contain the select method which allows you to filter collection members using a JSQL query. This
method is given the class of objects stored in the collection (for some
collections such as FieldIndex this is known and should not be specified) and
the JSQL condition. As a result of its work, select returns the iterator of the
selected objects.
| Example | Meaning |
|---|---|
| expression | non-terminals |
| not | terminals |
| | | disjoint alternatives |
| (not) | optional part |
| {1..9} | repeat zero or more times |
select-condition ::= ( expression ) ( traverse ) ( order )
expression ::= disjunction
disjunction ::= conjunction
| conjunction or disjunction
conjunction ::= comparison
| comparison and conjunction
comparison ::= operand = operand
| operand != operand
| operand <> operand
| operand < operand
| operand <= operand
| operand > operand
| operand >= operand
| operand (not) like operand
| operand (not) like operand escape string
| operand (not) in operand
| operand (not) in expressions-list
| operand (not) between operand and operand
| operand is (not) null
operand ::= addition
additions ::= multiplication
| addition + multiplication
| addition || multiplication
| addition - multiplication
multiplication ::= power
| multiplication * power
| multiplication / power
power ::= term
| term ^ power
term ::= identifier | number | string
| true | false | null
| current
| ( expression )
| not comparison
| - term
| term [ expression ]
| identifier . term
| function term
| count (*)
| contains array-field (with expression) (group by identifier having expression)
| exists identifier : term
function ::= abs | length | lower | upper
| integer | real | string |
| sin | cos | tan | asin | acos |
| atan | log | exp | ceil | floor
| sum | avg | min | max
string ::= ' { { any-character-except-quote } ('') } '
expressions-list ::= ( expression { , expression } )
order ::= order by sort-list
sort-list ::= field-order { , field-order }
field-order ::= field (asc | desc)
field ::= identifier { . identifier }
fields-list ::= field { , field }
user-function ::= identifier
Identifiers are case sensitive, begin with a..z, A..Z, '_' or '$' character, contain only a-z, A..Z, 0..9 '_' or '$' characters, and do not duplicate any SQL reserved words.
| abs | acos | and | asc | asin |
| atan | avg | between | by | contains |
| cos | ceil | count | current | desc |
| escape | exists | exp | false | floor |
| group | having | in | integer | is |
| length | like | log | lower | max |
| min | not | null | or | real |
| sin | string | sum | tan | true |
| upper | with |
JSQL extends ANSI standard SQL operations by supporting bit
manipulation operations. Operators and/or can be
applied not only to boolean operands but also to operands of integer type. The
result of applying
and/or operator to integer operands is an integer value with its bits set
by bit-AND/bit-OR operation. Bit operations can be used for efficient
implementation of small sets. Also raising to a power operation ^ is supported
by JSQL for integer and floating point types.
length() function.
[] operator.
If the index expression is out of the array range, then an exception will be
raised.
in operator can be used
for checking if the array contains the value specified by the left operand.
This operation can only be used for arrays of atomic types: with boolean,
numeric, reference or string components.
exists
operator. The variable specified after the exists keyword can be used as an index in arrays
in the expression preceded by the exists
quantor. This index variable will iterate through all possible array index values, until the value of the expression becomes
true or the index runs out of range. Condition
exists i: (contract[i].company.location = 'US')
will select all
details which are shipped by companies located in US, while the query:
not exists i: (contract[i].company.location = 'US')
will select all
details which are shipped only by companies outside US.
Nested exists clauses are allowed. Use of nested
exists quantors is
equivalent to nested loops using correspondening index variables. For example,
the query
exists column: (exists row: (matrix[column][row] = 0))
will select all
records, containing 0 in the elements of the matrix
field, which have
an array of integer type. This construction is equivalent to the following two
nested loops:
boolean result = false;
for (int column = 0; column < matrix.length(); column++) {
for (int row = 0; row < matrix[column].length(); row++) {
if (matrix[column][row] == 0) {
result = true;
break;
}
}
}
The order of
using indices is significant! The result of the following query execution
exists row: (exists column: (matrix[column][row] = 0))
will be
completely different from the result of the previous query. The program may
simply hang in the last case due to occurrence of an infinite loop for empty
matrices.
IndexOutOfRangeError exception to notify the end of iteration,
contains with will
just iterate through all the array elements and not throw any exception. Hence,
the performance of
contains with is better.
The like construction can be used for matching strings with a pattern
containing special wildcard characters '%' and '_'. The '_' character matches
any single character, while the '%' character matches any number of characters
(including zero characters). The extended form of the
like operator with the escape part can be used to handle the '%' and '_' characters in the
pattern as normal characters if they are preceded by a special escape
character, specified after the
escape keyword.
It is possible to search for a substring within a string by
using the in
operator. The expression ('blue' in color) will
be true for all records which contain 'blue' in their color field.
Strings can be concatenated by the + or || operators. The latter one was added only for compatibility with the ANSI SQL
standard. Because JSQL doesn't support implicit conversion to string type in expressions,
the semantic of operator + can be redefined for strings.
company.address.city = 'Chicago'
will access records referenced by the company component of the
Contract record and extract the city component of the
address field of the referenced record from the Supplier
table.
References can be checked for null by is null
or is not null predicates. Also, references can be compared for equality with each other as
well as with the special null
keyword. When a null reference is dereferenced, an exception will
be raised by JSQL.
There is a special keyword current, which can be
used to get a reference to the current record during a table search. Usually, the
current
keyword is used for the comparison of the current record identifier
with other references or for locating the current record identifier within an array
of references. For example, the following query will search the Contract
table for all active contracts (assuming that the field canceledContracts has
Contract[] type):
current not in supplier.canceledContracts
A programmer can define methods for structures, which can
be used in queries with the same syntax as normal structure components. Such
methods should have no arguments, except for a pointer to the object to which
they belong (the this pointer in Java), and should return an atomic value (of boolean,
numeric, string or reference type). Also, the method should not change the object
instance (immutable method). So, user-defined methods can be used for creating virtual components - components which are not stored in a database, but instead are
calculated using the values of other components. For example, you can use the standard
Java
java.util.Date class with such methods as getYear(), getMonth() .
. . . Thus, it is possible to specify queries like: "delivery.getYear = 99"
in an application where the delivery record field
has the
java.util.Date type.
| Name | Argument type | Return type | Description |
|---|---|---|---|
| abs | integer | integer | absolute value of the argument |
| abs | real | real | absolute value of the argument |
| sin | real | real | sin (rad) |
| cos | real | real | cos (rad) |
| tan | real | real | tan (rad) |
| asin | real | real | arcsin |
| acos | real | real | arccos |
| atan | real | real | arctan |
| exp | real | real | exponent |
| log | real | real | natural logarithm |
| ceil | real | real | the smallest integer value that is not less than the argument |
| floor | real | real | the largest integer value that is not greater than the argument |
| integer | real | integer | conversion of real to integer |
| length | array | integer | number of elements in array |
| lower | string | string | lowercase string |
| real | integer | real | conversion of integer to real |
| string | integer | string | conversion of integer to string |
| string | real | string | conversion of real to string |
| upper | string | string | uppercase string |
To be able to use indices, a programmer has to obtain a query
object and register the indices in it. JSQL supports the Perst Index and FieldIndex indices. The indices are located by JSQL by their field name. It is assumed
that class objects returned by the index iterator are the same as specified in the
select query. If you are going to use the same query instance for
selecting data from different collections (with different collection member
classes), please check that there are no name conflicts for the key fields,
i.e. index corresponding to one collection will not be applied for another
collection.
Perst has very simple JSQL queries optimizer. Perst is able to use indices for query execution if all the conditions below are true:
(company.location.city ==
"London"),
then indexes are required not only for the "city" field, but also for the
"location" and "company" fields.
Maintaining indices is the responsibility of a programmer. JSQL is only using the indices for searching elements by their keys.
Database class. This class emulates a relational database on top of Perst. Certainly,
even with this class, Perst provides a very object-oriented way of dealing with
persistent data.
The Database class allows you to create/drop tables, add/drop indices, create/update/delete
records. It automatically updates indices when a new record is added or
removed. Also, the Database class makes it possible to prepare and execute queries.
The Database class is used as a wrapper for the underlying Perst storage. The storage
passed to the constructor of the Database class should be opened and either not initialized,
or initialized before by the same Database constructor. It is not possible
to attach a database to the Perst Storage with an existing root object other than one set
by the Database constructor.
A table in a database is associated with the Java class.
Almost all methods of the database class require passing of the Class parameter or persistent object instance.
In the latter case, class of this object is used. If Perst finds no table associated with the class, it tries to find a table
for the superclass and so on... Hence, it is possible to have a "polymorphic" table - a table containing records
(objects) of multiple classes derived from this one. Methods dealing with an object instance
have an overloaded version where the class can be specified explicitly. This
makes it possible to place an object in a table associated with some particular interface.
A database provides two ways of querying tables: using simple
and prepared queries. In the first case, Database.select directly returns the result of query execution (iterator). In the second case,
it is possible to execute a prepared query multiple times, specifying different
values of parameters.
It is responsibility of programmer in Perst
to maintain consistency of indices: before updating key field it is necessary to exclude
the object from the index and after assigning new value to the key field - reinsert it in the index.
Database class provides excludeFromAllIndices and
includeInAllIndices methods, which can be used to include/exclude object in all related indices.
If you know exactly which field will be updated, it is more efficient to use
includeInIndex and excludeFromIndex methods specifying name of the updated field.
But it is more convenient to use Database.updateKey method which do all this steps itself:
exclude from index, update, mark object as been modified and reinsert in index.
It is safe to call updateKey method for fields which are actually not used in any index -
in this case excludeFromIndex/includeInIndex does nothing.
Please find more information on the Database class methods in
Javadoc and look at the JSQL-SSD example -
Supplier-Shipment-Detail database
built using the Database class.
Starting in version 4.04, Perst provides another way to construct queries ? a code generator. This is a special interface class in which methods correspond to JSQL operations. Consider the following example, in SQL:
select * from Person where salary > 10000 and age <= 40 order by name ;
In Perst, such a query can be written in this way:
for (Person p : db.<Person>select(Person.class, "salary > 10000 and age <= 40 order by name"))
{
...
}
Using the code generator, the query can be built as follows:
Query<Person> query = db.<Person>createQuery(Person.class);
CodeGenerator code = query.getCodeGenerator();
code.predicate(code.and(code.gt(code.field("salary"), code.literal(10000)),
code.le(code.field("age"), code.literal(40))));
code.orderBy("name");
for (Person p : query)
{
...
}
Most of the methods of the CodeGenerator class return a result
of type CodeGenerator.Code interface. This interface provides
no methods, since the structure of nodes that it generates is opaque to the
programmer. These nodes are used as operands for other methods, in order to
construct an abstract syntax tree (AST) representing the query.
Method CodeGenerator.predicate(Code condition) specifies the
query predicate. If this method is not invoked, then the query will select
all instances of the specified class.
Method CodeGenerator.orderBy allows the user to specify the
required sort order. It is possible to call this methods several times, in
which case sorting will be performed by all specified fields in the
specified order, as illustrated in another example. Here the operation is performed in SQL:
SELECT * FROM Car WHERE color IN ('black, 'silver')
AND mileage < 100000 AND price < 20000 AND (airCondition OR climatControl)
ORDER By price desc, model asc;
Below is a code fragment constructing this query using CodeGenerator:
Query<Car> query = db.<Car>createQuery(Car.class);
CodeGenerator code = query.getCodeGenerator();
code.predicate(code.and(code.and(code.and(code.in(code.field("color"),
code.list(code.literal("black"),
code.literal("silver"))),
code.lt(code.field("mileage"),
code.literal(10000))),
code.lt(code.field("price"),
code.literal(20000))),
code.or(code.field("airCondition"),
code.field("climatControl"))));
code.orderBy("price", false);
code.orderBy("model", true);
for (Car car : query)
{
...
}
The query should be constructed using the createQuery methods
of the Storage or Database classes. Method
Query.getCodeGenerator resets the query (clears all specified
parameters, order by clauses, and query predicates), so that it is possible
to call these methods several times for the same query. There are two
versions of these methods: one takes the Class parameter,
specifying the class for which the query will be constructed. Another,
overloaded version takes no parameters, on the assumption that that the
class was previously associated with the query using the
Query.setClass method.
Concerning the interface to a database system, there is the
object persistence aspect which should control transparent object loading and
storing to/from the storage. Without AspectJ, Perst uses a programmer
controlled, recursive object loading mechanism (when some object is requested,
all objects referenced from this object will also be fetched unless they
redefine recusriveLoading method to
return false. To store a modified object in a database, a programmer should
explicitly call the
store() or the modify() method. With AspectJ this is not needed: persistence aspect will
automatically load referenced object and mark it as modified when values of
some of the object’s fields are changed. Also, there is no need to derive
all the persistence-capable classes from the Persistent base class;
it can be easily done using the AspectJ declaration construction.
The AspectJ interface to Perst was developed by
Patrick Morris-Suzuki. It is located in
the package:
org.garret.perst.aspectj. To build it, you can use
perst/src/compile_aspectj.bat batch file (AspectJ should be properly installed before). It will
build
perst/lib/perst_aspectj.jar file. Each persistence-capable method must implement
org.garret.perst.aspectj.AutoPersist, which is a marker interface with no methods. However, the user can
avoid this by writing a simple aspect to define the persistence-capable classes
like this:
public aspect PersistentClasses {
declare parents: mypackage.* implements AutoPersist;
}
Then you can build your application like
this:
ajc *.java -aspectpath /perst/lib/perst_aspectj.jarRestrictions of AspectJ API to Perst:
hashCode() and equals() methods will work correctly only after an object is assigned its OID.
In the perst/tst/aspectj directory there is an example of an application using AspectJ
interface to Perst: Guess. It is the same example as "Guess an
animal" located in the perst/tst directory (which does not use
AspectJ), but it doesn't contain explicit calls to the store() method. Also, the original Guess example recursively loaded the while game tree
when the root object is accessed; whereas the AspectJ version fetches object
instances only on demand. You can build the examples using the compile.bat script and run them using the Guess.bat script.
With JAssist you can use a normal Java compiler. The Perst
translator for JAssist automatically makes all classes matching specific name
patterns persistence capable. With this translator, it is not required for the application
classes to be derived from the Persistent class and provide an empty default constructor.
To be able to use the JAssist interface, you should use the special JAssist class loader instead of the default Java class loader to load all classes which you want to make persistence capable. Classes with fully qualified names matching one of the specified patterns will be transformed during loading. Other classes are loaded unchanged.
I have slightly modified the JAssist code to be able to distinguish access to the this field
and to the fields of foreign objects. This makes it possible to eliminate extra overhead
for the this field access. If you are going to use the original JAssist library,
then you should comment my code using !isSelfReader() condition.
In this case Perst will not be able to handle access to the foreign (non this) fields.
You should use the getter/setter methods instead.
Alternatively, you can replace this condition with true,
allowing access to foreign fields but with significant degradation of
performance and increased code size, because in this case before ALL accesses
to fields of a persistence-capable object, a call to load() method will be inserted.
The modified versions of the JAssist files are included in the Perst distribution in the
perst/src/org/garret/jassist/edit directory.
java.lang.Object, then, replace the superclass with org.garret.perst.Persistent.
Persistent.load() method at the beginning of each instance method.
Persistent.modify() method before each write to a field.
recursiveLoading method returning false.
The JAssist interface has only one restriction: you should not access the fields of any persistent object other than a this object in a constructor. Below is an example using the JAssist interface:
package com.mycompany.mypackage;
import org.garret.perst.jassist.PerstTranslator;
import javassist.*;
public class MyApp {
public static void main(String[] args) {
Translatator trans = new PerstTranslator(new String[]{"com.mycompany.*"});
ClassPool pool = ClassPool.getDefault();
Loader cl = new Loader(pool);
cl.addTranslator(pool, trans);
Thread.currentThread().setContextClassLoader(cl);
cl.run("Main", args);
}
}
In this example all the classes from com.mycompany.mypackage except MyApp will be loaded by the JAssist class loader and
automatically made persistence capable.
To build the JAssist interface, you should use the compile-aspectj.bat
file in the perst/src directory. The javassist.jar file is
included in the Perst distribution and located in the perst/lib directory. You
can find examples using the JAssist interface in the perst/tst/jassist directory.
Perst performs cyclic scanning of the bitmap pages. It
keeps the identifier of the current bitmap page and the current position within
the page. Every time an allocation request arrives, scanning of the bitmap
starts from the current position. When the last allocated bitmap page is
scanned, scanning continues from the beginning (from the first bitmap page)
until the current position is reached. When no free space is found after a full
cycle through all bitmap pages, a new block of memory is allocated. The size of
the extension is the larger of the size of the allocated object and the extension
quantum. The extension quantum is a parameter of the database, specified in the
constructor. A bitmap is extended to be able to map additional space. If the virtual
space is exhausted and no more bitmap pages can be allocated, then the OutOfMemory exception
is raised.
Allocating memory using bitmaps provides a high locality of references (objects are mostly allocated sequentially) and also minimizes the number of modified pages. Minimization of the number of modified pages is significant when the commit operation is performed and all dirty pages should be flushed to the disk. When all cloned objects are placed sequentially, the number of modified pages is minimal and so, the transaction commit time is also reduced. Using the extension quantum also helps to preserve sequential allocation. Once a bitmap is extended, objects will be allocated sequentially until the extension quantum will be completely used up. Only after reaching the end of the bitmap, scanning restarts from the beginning, searching for holes in previously allocated memory.
To reduce the number of bitmap page scans, Perst associates a descriptor with each page, which is used to remember the maximal size of the hole on the page. Calculation of the maximal hole size is performed in the following way: if an object of size M cannot be allocated from this bitmap page, then the maximal hole size is less than M. Thus, M is stored in the page descriptor if the previous value of the descriptor is larger than M. For the next allocation of an object of size greater or equal to M, we will skip this bitmap page. The page descriptor is reset when some object is deallocated from this bitmap page.
Some database objects (like hash table pages) should be aligned on a page boundary to provide more efficient access. The Perst memory allocator checks the requested size and whether it is aligned on a page boundary. Then, the address of the allocated memory segment is also aligned on a page boundary. A search for a free hole will be faster in this case, because Perst increases the step size of the current position increment according to the value of the alignment.
To be able to deallocate memory used by an object, Perst needs to keep information about the object size somewhere. The Perst memory allocator deals with two types of objects - normal table records and page objects. All table records are prepended by a record header, which contains the record size and a pointer to an L2-list linking all records in the table. So, the size of the table record object can be extracted from the record header. Page objects always occupy the whole database page and are allocated at the positions aligned on a page boundary. Page objects have no headers. Perst distinguishes page objects from normal objects by using a special marker in the object index.
When an object is modified the first time, it is cloned (a copy of the object is created) and the object handle in the current index is changed to point to the newly created object copy. The shadow index still contains the handle which points to the original version of the object. All changes are done with the object copy, leaving the original object unchanged. Perst makes a mark in a special bitmap page of the object index which contains the modified object handle.
When a transaction is committed, Perst first checks if the size of the object index was increased during the current transaction. If so, it also reallocates the shadow copy of the object index. Then, Perst frees memory for all "old objects", i.e. objects which were cloned within a transaction. Memory cannot be deallocated before a commit, because we want to preserve the consistent state of the database by keeping a cloned object unchanged. If we deallocate memory immediately after cloning, a new object can be allocated in place of the cloned object and we lose consistency. As far as memory deallocation is performed in Perst by bitmaps using the same transaction mechanism as for normal database objects, deallocation of object space will require clearing some bits in the bitmap page, which also should be cloned before modification. Cloning the bitmap page will require new space for allocating the page copy, and we can reuse the space of deallocated objects. However, it is not acceptable due to the reason explained above - we will lose database consistency. That is why deallocation of an object is done in two steps. When an object is cloned, all bitmap pages used for marking the object’s space are also cloned (if not cloned before). So when a transaction is committed, we only clear bits in the bitmap pages and no more requests for allocation of memory can be generated at this moment.
After deallocation of old copies, Perst flushes all modified pages to the disk to synchronize content of the memory and the disk files. After that Perst changes the current object index indicator in the database header to switch the roles of the object indices. Now the object index, which was current becomes shadow, and the shadow index becomes current. Then, Perst again flushes the modified page (i.e. page with the database header) to the disk, transferring the database to a new, consistent state. After that Perst copies all modified handles from the new object index to the object index which was previously shadow and now becomes current. At this moment, contents of both indices are synchronized and Perst is ready to start a new transaction.
The bitmap of the modified object index pages is used to minimize the time taken for committing a transaction. Not the whole object index, but only its modified pages should be copied after committing of the transaction bitmap is cleared.
When a transaction is explicitly aborted by the Storage.rollback
method, the shadow object index is copied back to the current
index, eliminating all changes done by the aborted transaction. After the end
of copying, both indices are identical again and the database state corresponds
to the moment before the start of the current transaction.
The allocation of object handles is done by the free handles list. The header of the list is also shadowed and two instances of the list header are stored in the database header. Switching between them is done in the same way as switching of object indices. When there are no more free elements in the list, Perst allocates handles from the unused part of a new index. When there is no more space in the index, it is reallocated. An object index is the only entity in a database which is not cloned on modification. Instead of this, two copies of the object index are always used.
There are some predefined OID values in Perst. OID 0 is reserved as the invalid object identifier. OIDs starting from 1 are reserved for bitmap pages. The number of bitmap pages depends on the database’s maximum virtual space. For one terabyte virtual space, 8 kb page size and 64 byte allocation quantum, 32 K bitmap pages are required. So, 32 K handles are reserved in the object index for the bitmap. Bitmap pages are allocated on demand, when the database size is extended. Hence, the OID of the first user object will be 0x8002.
The recovery procedure is trivial in Perst. There are two
instances of object index, one of which is current and the other corresponds to
the consistent database state. When a database is opened, Perst checks the database
header to detect if the database was closed normally. If not (the dirty flag
is set in the database header), then Perst performs a database recovery.
Recovery is very similar to the rollback of a transaction. The indicator of the
current index in a database object header is used to determine the index
corresponding to the consistent database state and the object handles from this
index are copied to another object index, eliminating all changes done by the uncommitted
transaction. As long as the only action performed by the recovery procedure is
copying of an object’s index (in reality only handles having different
values in the current and shadow indices are copied to reduce the number of
modified pages) and the size of the object index is small, a recovery can be performed
very fast. Fast recovery procedures reduce the "out-of-service" time
of an application.
The table below summarizes the special features of Perst:
perst.jarbuild.
You can also build it yourself using the
compile.bat script in the
src directory. The only thing you need to work with the database is to
include this JAR in your class path. A template for a Perst application is as
follows:
import org.garret.perst.*;
public class YourPersistentClass extends Persistent {
int x;
String y;
Link links;
...
void doUpdate() {
x = 1;
y = "Hello World";
links = getStorage().createLink();
store(); // save changes in the database
}
}
public class YourRoot extends Persistent {
YourPersistentClass ref;
Index myIndex;
void foo() {
ref.doUpdate(); // referenced object has no to be explictely loaded
}
};
public class YourApplication {
static public void main(String[] args) {
String databaseFilePath = args[0]; // path to the database file
int pagePoolSize = Integer.parseInt(args[1], 10); // size of page pol in bytes
Storage storage = StorageFactory.getInstance().createStorage();
storage.open(databaseFilePath, pagePoolSize);
YourRoot root = (YourRoot)storage.getRoot();
if (root == null) {
// Storage was not initialized yet
root = new YourRoot();
root.myIndex = storage.createIndex(String.class, true); // unique index
storage.setRoot(root);
}
root.foo(); // all objects referenced from root are implicitly loaded
YourPersistentClass obj = root.myIndex.get(new Key("John")); // find object with key "John"
... // do something else with database
storage.commit(); // commit transaction
... // do something else with database
storage.close(); // commit the last transaction and close the database
}
}
You can also look at these Perst examples:
Link interface.
Database class (used when JSQL predicate contains indirect access to the fields, like "album.label.name=?")
Database class.
Version and VersionHistory classes and their performance.
Projection class.
The tests are located in the tst directory and can
be built using the compile.bat
script.
src contains Perst sources for JDK 1.5 with support of generic classes.
Also perst.jar now corresponds to JDK 1.5 version.
If you need version of Perst for JDK 1.4, please use perst14.jar which sources are now in
src14 directory.
IPersistent interface
(*) only if weak references are not supported by target environment. Weak references are avaialble in CLDC1.1
Perst Lite does not use the Java reflection mechanism. So,
to be able to store/fetch persistent objects to/from the database, the
programmer must provide special pack/unpack routines. Perst provides the ISerializable interface, which defines the following methods:
/** * Interface which should be implemented by any object that can be serialized. * Implementation ofwriteObjectreadObjectmethods should * store/fetch content of the object to/from the provided stream. * The order of storing/fetching fields should be the same in both methods. */ public interface ISerializable { /** * Store object fields in the stream. * @param out stream to which object is written */ void writeObject(IOutputStream out); /** * Fetch object fields from the stream * @param in stream from which object data should be fetched */ void readObject(IInputStream in); /** * Check if objects can be pinned in cache to survive transaction commit. * Commit performs cleanup of strong object cache, leaving only limited number * of most frequently used objects (pin objects). * It is not possible to pin object which contains references to other persistent objects, * since such reference can point to the object which thrown away from the cache. * @return if object can be pinned in memory (object contains no referenes) */ boolean isPinnable(); }
So, each persistence-capable class should implement this
interface (actually it should be derived from the
org.garret.perst.Persistent class which provides the basic implementation of this interface)
and implement the writeObject/readObject methods. The interfaces IOutputStream and IInputStream provide methods for storing and fetching respectively, values of different
types. Below are example definitions of pack/unpack methods:
public class Test extends Persistent {
int i;
float f;
String s;
Test o;
public writeObject(IOutputStream out) {
super.writerObject(out);
out.writeInt(i);
out.writeFloat(f);
out.writeString(s);
out.writeObject(o);
}
public void readObject(IInputStream in) {
super.readObject(in);
i = in.readInt();
f = in.readFloat();
s = in.readString();
o = (Test)in.readObject();
}
}
Also, because of restrictions on the JDK
1.1 reflection mechanism, a persistence-capable class should be:
org.garret.perst.reflect package which is replacement of
standard Java java.lang.reflect package. But descriptors of class, fields and methods
should be created by programmer or generated by SerGen utility.
Reflection support as well as Perst classes using reflection are provided in
perst-rms-reflect.jar and perst-jsr75-reflect.jar archives.
Reflection support adds about 100kb to jar file size. Them should be useed only if you need JSQL or field indices.
below is an example of describing class format using org.garret.perst.reflect package:
import org.garret.perst.*;
import org.garret.perst.reflect.*;
import java.util.Date;
class MyClass extends Persistent {
int i;
String s;
MyClass next;
void foo(int x) {
// some user's code...
}
String f(Date d, float f) {
// some user's code...
}
int g() {
// some user's code...
}
// It is necessary to create instance of Type class to register it in reflection provider
public static final Type TYPE =
return new Type(MyClass.class, // described class
Persistent.class, // basse class
new Class[]{}, // implemented interfaces
new Field[] { // list of the fields
new Field(Integer.class, // field type: J2ME doesn't provide classes for builtin types,
// so we have to use Integer.class instead of int.class
"i", // field name
Modifier.Indexable|Modifier.Unique) // field modifiers used by Database class
{
// Getter method
public Object get(Object obj) {
return new Integer(((MyClass)obj).i);
}
// Setter method
public void set(Object obj, Object value) {
((MyClass)obj).i = ((Integer)value).intValue();
}
// It is not necessary to redefine getInt/setInt methods
// because default implementation will use set/get method,
// but certainly such implementation is more efficient
public int getInt(Object obj) {
return ((MyClass)obj).i;
}
public void setInt(Object obj, int value) {
((MyClass)obj).i = value;
}
},
new Field(String.class, "s", Modifier.FullTextIndexable) {
public Object get(Object obj) { return ((MyClass)obj).s; }
public void set(Object obj, Object value) { ((MyClass)obj).s = (String)value; }
},
new Field(MyClass.class, "next") {
public Object get(Object obj) { return ((MyClass)obj).next; }
public void set(Object obj, Object value) { ((MyClass)obj).next = (MyClass)value; }
},
},
new Method[] { // list of the methods
new Method(null, // type of void method shoudl be null
"foo", // method name
new Class[]{Integer.class}) // method parameter types
{
// Invocation method
public Object invoke(Object obj, Object[] args) {
((MyClass)obj).foo(((Integer)args[0]).intValue());
return null; // In case of void method we should return null
}
},
new Method(String.class, // method return type
"f", // method name
new Class[]{Date.class, Float.class}) // method parameter types
{
public Object invoke(Object obj, Object[] args) {
return ((MyClass)obj).f((Date)args[0], ((Float)args[1]).floatValue());
}
},
new Method(Integer.class, "g", new Class[]{}) {
public Object invoke(Object obj, Object[] args) { return new Integer(((MyClass)obj).g()); }
},
})
{
// Method creating instance of the class (default implementation uses
// java.lang.Class.newInstance, but providing specialized implementation
// allows to instantiate non-public classes with non-public default constructor
public Object newInstance() {
return new MyClass();
}
};
}
}
Please look at JSQLTest1 and JSQLTest2 MIDP examples in perst/tst11/midp
illustrating use of reflection, JSQL and Database class in Perst Lite.
perst-rms-reflect.jar and perst-jsr75-reflect.jar
also provides reflection support and Perst stuff based on reflection: field indices,
JSQL query language, RDMS facade class. There is also separate package with XML import/export
support: perst11xml.jar which can be used in conjunction with any Perst Lite package
to provide way of exporting/importing Perst Lite database to/from XML.
Note for Blackberry users:
-blackberry option which forces SerGen to use
temporary variables to store references to class objects.
java.io.InputStream.skip method which is very naively
implemented in some J2ME runtimes: it just reads data from the stream bye-by-byte until
reaches the requested position. For example at Nokia
6110 (S60 3-rd edition) performing 10000 index searches through 10000
records takes ... more than 3 hours! And the database size in this example
is only about 600kb. So this size can easily be loaded in memory.That is why Perst Lite includes special version of PagePool - InfinitePagePool which preloads the whole database file in memory. It was also possible with mainstream Perst version to use infinite page pool. But this implementation of infinite page pool for Perst Lite is much more efficient and requires less memory. But the main difference is that it loads the while database in memory during open. With this implementation search time of the test mentioned above is about 8 seconds Compare it with 3 hours!
With JSR 75 it is more or less straightforward - you should create database file at
the PC using perst11.jar or any simulator which stores files created using JSR 75 API in as normal files
in PC file system. But JSR 75 is not available everywhere. And format of RMS storage
is opaque and vendor specific. But it is possible to import data in the RMS record storage at low level
(as Perst pages) - it is much faster then just inserting data in the database (because
there no need to construct index, allocate space, pack objects,...).
Implementation of org.garret.perst.impl.RmsFile on top of RMS includes static importData
method which can be perform such import from the arbitrary input stream:
/**
* Import data from specified input stream to record store
* @param storeName storage name prefix. If there is limitation for storage size, then several storages with
* this prefix and suffixes 0, 1, 2,... will be created.
* @param in input stream. Data can be imported from resource file (jar file), from network connection or from other
* data source
* @return true if data was successfully imported, false if storage is not empty or read operation from input stream
* is failed
*/
public static boolean importData(String storageName, InputStream in) throws IOException, RecordStoreException;
Using this methods it is possible to prepare database at PC database file (using
emulator or perst11.jar library), include it in the midlet .jar file
as resource file and import this file to the RMS storage:
protected void startApp()
{
storage = StorageFactory.getInstance().createStorage();
org.garret.perst.impl.RmsFile.importData("testindex.dbs", getClass().getResourceAsStream("/testindex.dbs"));
storage.open("testindex.dbs");
...
}
The importData method imports the resource file only when RMS storage is
empty. So the is no need to programmer to perform extra checking -
if database is initialized or not.It is possible to use importData method not only for importing resource files from .jar file but also for importing data from any data source, for example from network connection.
writeObject/readObject methods. The only thing you have to do is to add stubs for the writeObject/readObject methods in the persistence-capable classes and specify the path to
them as arguments to the SerGen utility. It is necessary to specify stubs in the
list of persistence-capable classes so that the SerGen utility does not have to
load the whole project, resolve all class paths and source locations, and try
to guess the right place and indentation of where to insert these generated
methods. Rather, SerGen uses a simplified solution that only requires the
programmer to explicitly specify the classes and stubs for the writeObject/readObject/getClassDescriptor methods. It solves both problems: SerGen does not have to guess
which classes are persistence capable or where to generate the bodies of these
methods.
If you do not want SegGen utility to replace writeObject/readObject/getClassDescriptor
methods explicitly written by you in some particular class you can mark them with /* @sealed */
comment. In this case SerGen will leave them untouched.
The getClassDescriptor should be specified if you are going to use reflection (field indices, JSQL queries,...).
You can also mark indexable fields for database class using @modifier tag in comments.
Suported modifiers are enumerated in org.garret.perst.Modifier class.
These modifiers should be specified after @modifier tag separated by space or commas:
/**
* Some integer key field
* @modifier Indexable, Unique
*/
int key;
To make Javadoc tool correctly proceed this tag, you should add -tag modifier:f:"Modifiers:" option
to javadoc tool.The SerGen utility is invoked as follows:
java SerGen list-of-files
Here, list-of-files is a list of
source files (*.java) or directories where these source files are located.
SerGen recursively traverses these directories, selects files with the .java
extension and tries to locate the writeObject/readObject methods in them. If the methods are located, then the file is patched: SerGen
first generates a XXX.java.tmp file, removes the original file, and renames XXX.java.tmp to XXX.java. The paths of all the patched files are reported on the console
output.This is the example of the original file (written by programmer):
import org.garret.perst.*;
import org.garret.perst.reflect.*;
import java.util.Date;
class MyClass extends Persistent {
/**
* Some integer field
* @modifier Indexable, Unique
int i;
/**
* Some string fieldx
* @modifier FullTextIndexable
*/
String s;
MyClass next;
void foo(int x) {
// some user's code...
}
String f(Date d, float f) {
// some user's code...
}
int g() {
// some user's code...
}
public void writeObject(IOutputStream out) {
}
public void readObject(IInputStream out) {
}
private static Type getClassDescriptor() {
return null;
}
public static final Type TYPE = getClassDescriptor();
}
And this the result of applying SerGen utility to this file:
import org.garret.perst.*;
import org.garret.perst.reflect.*;
import java.util.Date;
class MyClass extends Persistent {
/**
* Some integer field
* @modifier Indexable, Unique
*/
int i;
/**
* Some string field
* @modifier FullTextIndexable
*/
String s;
MyClass next;
void foo(int x) {
// some user's code...
}
String f(Date d, float f) {
// some user's code...
}
int g() {
// some user's code...
}
public void writeObject(IOutputStream out) {
super.writeObject(out);
out.writeInt(i);
out.writeString(s);
out.writeObject(next);
}
public void readObject(IInputStream in) {
super.readObject(in);
i = out.readInt();
s = out.readString();
next = (MyClass)out.readObject();
}
private static Type getClassDescriptor() {
return new Type(MyClass.class, Persistent.class, new Class[]{}, new Field[]{
new Field(Integer.class, "i", Modifier.Indexable|Modifier.Unique) {
public Object get(Object obj) { return new Integer(((MyClass)obj).i); }
public void set(Object obj, Object value) { ((MyClass)obj).i = ((Integer)value).intValue(); }
public int getInt(Object obj) { return ((MyClass)obj).i; }
public void setInt(Object obj, int value) { ((MyClass)obj).i = value; }
},
new Field(String.class, "s", Modifier.FullTextIndexable) {
public Object get(Object obj) { return ((MyClass)obj).s; }
public void set(Object obj, Object value) { ((MyClass)obj).s = (String)value; }
},
new Field(MyClass.class, "next") {
public Object get(Object obj) { return ((MyClass)obj).next; }
public void set(Object obj, Object value) { ((MyClass)obj).next = (MyClass)value; }
},}, new Method[]{
new Method(null, "foo", new Class[]{Integer.class}) {
public Object invoke(Object obj, Object[] args) { ((MyClass)obj).foo(((Integer)args[0]).intValue()); return null; }},
new Method(String.class, "f", new Class[]{Date.class, Float.class}) {
public Object invoke(Object obj, Object[] args) { return ((MyClass)obj).f((Date)args[0], ((Float)args[1]).floatValue()); }},
new Method(Integer.class, "g", new Class[]{}) {
public Object invoke(Object obj, Object[] args) { return new Integer(((MyClass)obj).g()); }},}) {
public Object newInstance() { return new MyClass(); }
};
}
public static final Type TYPE = getClassDescriptor();
}
SerGen supports custom serializers, with three different schemas triggered by the field annotations @serializable, @binary and @property.
Field types marked with the @serializable annotation implement the org.garret.perst.ISerializable interface. The field’s value is serialized inside the object containing this field using the readObject/writeObject methods of the ISerializable interface. SerGen generates the following code in the pack/unpack method for such fields:
class Diagnosis {
/**
* Unique identifier
* @serializable
*/
UUID id;
public void writeObject(IOutputStream out) {
super.writeObject(out);
id.writeObject(out);
}
public void readObject(IInputStream in) {
super.readObject(in);
id = new UUID(); id.readObject(in);
}
}
Fields marked as @binary are also serialized using their own method of this class. But in this case, the method toByteArray is used instead of readObject. And classes of this field are intended to have constructors with a byte array parameter. So @binary should be used if the goal is to avoid adding dependency on Perst code to a class, because serialization is done through a standard byte array.
There is yet another difference with the @serializable annotation. Types of this field returned by the Perst reflection mechanism are byte[], and SerGen generates the code to convert objects to/from byte arrays. So it is possible to insert this field in indices (which is especially convenient when using the Database class - the field can simply be marked as indexable). The field will be inserted into an index as a byte array key, and byte arrays can be used to locate the object.
Below is code generated by SerGen for a @binary field:
class Diagnosis {
/**
* Unique identifier
* @binary
* @modifier Unique,Indexable,FixedSize
*/
UUID id;
public void writeObject(IOutputStream out) {
super.writeObject(out);
out.writeArrayOfByte(id.toByteArray());
}
public void readObject(IInputStream in) {
super.readObject(in);
id = new UUID(in.readArrayOfByte());
}
private static Type getClassDescriptor() {
return new Type(Diagnosis.class, Persistent.class, new Class[]{}, new Field[]{
new Field(byte[].class, "id", Modifier.Indexable|Modifier.Unique|Modifier.FixedSize) {
public Object get(Object obj) { return ((Diagnosis)obj).id.toByteArray(); }
public void set(Object obj, Object value) { ((Diagnosis)obj).id = new UUID((byte[])value; }
},
};
}
public static final Type TYPE = getClassDescriptor();
}
Note the FixedSize modifier. It informs Perst that the byte array representing the key value always has the same length. It allows use of a more efficient B-Tree implementation when dealing with varying-length keys.
If it is not possible or desirable to change the sources of the class whose value is stored in the field, the @property modifier can be used. In this case, the declaring class should provide getXXX(IInputStream in) and setXXX(IOutputStream out, XXX value) where XXX is the type of the field, as shown below:
class Diagnosis {
/**
* Unique identifier
* @property
*/
UUID id;
public void writeObject(IOutputStream out) {
super.writeObject(out);
setUUID(out, id);
}
public void readObject(IInputStream in) {
super.readObject(out);
id = getUUID(in);
}
// These methods are written by the programmer
private void setUUID(IOutputStream out, UUID id) {
out.writeLong(id.getTime());
out.writeLong(id.getClockSeqAndNode());
}
private UUID getUUID(IInputStream in) {
return new UUID(in.readLong(), in.readLong());
}
}
java -classpath dbconv.jar DBConv [-encrypt KEY | -decrypt KEY] [-inflate|-deflate] DATABASE-FILE(s)
If you determine that your database has grown too large, or you decide to encrypt the database,
you can use this utility. But do not forget to update your application to incorporate the encryption and/or compression process, as well.
Encryption is performed in the same way by both the standard Perst, and Perst Lite for Java ME. This utility can be used with both for encryption/decryption. However, the format of the compressed database is different between standard Perst and Perst Lite. Perst Lite supports compression only when using the RMS (record management storage) package. And this utility can perform only compression/decompression (not encryption) for Perst Lite.
This utility is especially useful for transferring a compressed database between a PC and a target device. As mentioned above, Perst Lite supports compression only when using RMS. But the RMS package can be used on a PC only under the device emulator. If your goal is to use the perst11.jar package to prepare a Perst database to be imported onto a mobile device, or to access a database that is exported from mobile device, you should use the DBConv utility to compress the database before transferring it to the device, and to decompress the database after transferring it from the device. An uploaded compressed database can be imported into RMS storage using the org.garret.perst.impl.RmsFile.importDatabase method. And to transfer the database back to the PC, use the org.garret.perst.impl.RmsFile.exportDatabase method.
Such a problem occurs only with a strong object cache, as opposed to
weak object cache. By default, if the
java.lang.WeakReference class is available, Perst Lite uses a
weak object cache. CLDC 1.1 supports weak references, so in CLDC 1.1, Perst
Lite always uses a weak object cache. In this case, it is unnecessary to pin
all objects in the cache throughout the transaction. Only strong referenced
and modified objects are pinned. However, please notice that since CLDC 1.1
doesn't support a finalization mechanism (which would enable a garbage
collector to handle deallocation of objects), Perst Lite has to hold all
modified objects in memory. So memory overflow is still possible if a
transaction updates or inserts too much objects. To avoid this, limit the
transaction size, for example, by splitting a large transaction into several
smaller ones, when possible.
Weak object caching eliminates the problem of dangling references to the root object (or any other persistent object). If an object is referenced from some variable, it will stay in the object cache until there are no more strong references to this object. It may cause another problem - if persistent object A contains reference to persistent object B and A is referenced from some program variable and so pinned in object cache, then if B will be also pinned in object cache (since it is also strong referenced). So conceivably, if all objects in the database are directly or indirectly referenced from some root object(s) and these root object(s) are pinned in memory, then the whole database may be pinned in memory. To prevent such scenario it is recommended to use Perst collection classes instead of self-made collections based on direct references (like linked lists, for example).
Implementation of the weak object cache for Perst Lite is located in a
separate directory (to make it possible to build the Perst library without
using weak references):
perst/src11/weak/org/garret/perst/impl/LruObjectCache.java.
If you are going to build your own version of the Perst library including
some subset of Perst classes, please do not forgot about this class.
Please note that in the case of a rollback you still have to ignore all references to persistent objects stored in program variables; it doesn't matter whether weak or strong cache is used. But there is the possibility in Perst to reload modified objects during transaction rollback and so in some cases be able to reuse the values of local variables.
Setting the "perst.reload.objects.on.rollback" property instructs Perst to reload all objects modified by the aborted (rolled back) transaction. It takes additional processing time, but in this case it is not necessary to ignore references stored in variables, unless they point to the objects created by this transactions (which were invalidated when the transaction was rolled back). Unfortunately, there is no way to prohibit access to such objects or somehow invalidate references to them. So this option should be used with care.
Perst provides very simple and still powerful and flexible full text search engine. A general search engine consists of the following components:
Perst search engine assumes that lemmatizer and stemmer will be provided by user (since them depends on the supported languages and document formats). Default implementation just extract from the document texts words as sequences of letters and digits and performs no stemming at all. Case of letters is obviously ignored.
Perst support basic logical operations in search query: AND, OR, NOT, for example
(MultidimensionalIndex OR KDTree) AND NOT RTree. Names of this operators can be redefined
in FullTextSearchHelper class.
To perform strict match of the phrase it is necessary to put in quotes: "to be or not to be".
It is actually all concerning syntax of full text query - you can see, that it is very simple.
As far as there may be very large number of documents matching specified query,
locating all of them may take a lot of time and user is rarely need to get all these results
(in most cases user never look at more than first ten results). This is why
Perst search engine allows to restrict number of returned documents and search time.
But even if search engine doesn't return all matched documents, it is still able to provide
estimation of this number.
Customization and tuning of full text search in Perst are performed using FullTextSearchHelper
class. This class contains methods for parsing query, splitting document text in tokens,
stemming of the words and adjusting search parameters. The org.garret.fulltext.FullTextSearchHelper
class provides default implementation of these methods. If user want to change default behavior, he
should create his own class derived from FullTextSearchHelper and override correspondent
methods in it.
I want to say a little bit more about tuning search parameters. Perst calculates document's rank using the following criteria:
FullTextSearchHelper.getOccurrenceKindWeights()
method. Word can belongs to several categories at the same time (for example in-title
and in-bold). Perst associates with each word bit mask of occurrence kinds (type of mask is byte, that is
why number of different occurrences is limited by 8). Default lemmatizer doesn't specify any
occurrence kind (mask is zero) and default weight of word occurrence is 1. When
you are specifying weights for your own occurrence kinds, please make them greater than 1 if you assume that
this occurrence kind is more important than default word occurrence (for example in-title) or less than 1
if them are less significant than default occurrences ( for example in-comments).FullTextQueryHelper.getNearnessWeight() method.
Perst uses the following formula to calculate rank of the document:
(keywordRank*(1 + nearness*nearnessWeight)).
Default value of nearnessWeight is 10. Make it smaller if nearness of words in you application is not so
important (users mostly search independent notions). You can also make it possible for
user to adjust this parameter dynamically (if implementation of getNearnessWeight()
method in your own class supports it.FullTextQueryHelper.getWordSwapPenalty()
method. Default value is also 10. But here it means that calculated distance between words will
be multiplied by this penalty. In case of the example above, the distance between words "Pentium" and "4"
will be the same for all documents - 1 (it is minimal distance which leads to the highest nearness criteria
value). By multiplying this distance on 10, we get result distance 10, so nearness for the second phrase
will be the same as for phrase "Pentium Pro CPU, 4 Gb RAM".
Full text index in Perst is just yet another collection class as well as B-Tree, KD-Tree and other indices.
To create full text index you should use Storage.createFullTextIndex method.
There are two overloaded methods with this name, one allows to specify FullTextSearchHelper
parameters, another - using default helper class.
To insert object in full text search index, you should either explicitly specify stream with object text
and language of this text (language may be needed to correctly perform stemming, default Perst implementation
doesn't perform stemming and so language is ignored), either object should
implement FullTextSearachable interface and provide text and text language itself.
Full text queries can be specified just as string (in this case Perst will perform parsing of the query using provided helper class) or in form of a tree (later will be useful in case of advance search form, will allows user to combine AND/OR predicates). More information on full text search API can be found in generated Perst API documentation. Also please look at tst/TestFullTextIndex.java example.
JDK 1.5 version of Perst also allows to easily use full text search in
The advantages of storing a Lucene index inside a Perst database are the following:
To create Perst storage of a Lucene index you should use the PerstDirectory class which is located in the perst/lucene directory:
To store references to Perst persistent objects inside a Lucene database, it is possible to add a special field to the document containing the document's OID (object identifier). The OID of the persistent object can be obtained using the
But the most convenient way of integrating full text search inside Perst is to use the "Continuous" package. This package provides a transparent RDBMS-like interface, version control, optimistic locking and built-in full text search capabilities (based on the Lucene search engine). For more information please read the Continuous section in the Perst manual.
Another important aspect missed in Perst is full text
search. Actually in real life, strict search is rarely needed - it is hard to
assume that a user knows the precise name of a book or author. Certainly, the implementation
of your own full text search engine is a very complex task, may be even more
complex than the implementation of Perst itself. Fortunately there is a good,
powerful, fast, well-known and free full text search engine for Java and .Net -
Lucene. It would be nice if Perst and Lucene could work together.
Versioning is yet another useful feature required by many
applications. Most informational systems dealing with important data (such as
finance information) have to store all versions of the documents to be able to
retrieve the state of the system at a particular moment of time. Also, such
popular services such as a wiki are based on versioning. Perst provides some
primitives for version control but they were not integrated in the database. It
is the responsibility of the programmer to maintain version histories and
locate particular versions.
From my experience with Perst, I found out that the object-oriented
model based on transparent persistence used in Perst is hard to understand for
many people not familiar with object-oriented databases. That is why people
prefer to use the
The main idea behind the development of the Continuous
package was to provide an extension of Perst which could address all the issues
discussed below:
I hope that the approach provided by the Continuous package
can actually be a good compromise in reaching all these goals. What makes me
think so is that the ideas used to solve one of these issues, also help in
other tasks. For example, multi-versioning is not useful by itself, but allows you
to achieve isolation of client transactions without setting a large number of
locks (which would result in a reduced level of concurrency and increased
probability of a deadlock). Each client transaction works with its own snapshot
of the database as seen at the moment the transaction begins, in isolation from
other transactions. Only during the transaction commit, a check is performed to
verify that changes done by the transaction do not conflict with changes
committed by other transactions. So, it is similar to optimistic locking. A
simplified table-like data structure, similar to the one used in relational
databases, allows you to hide manipulations with indices and a full text search
engine from the programmer.
The Lucene full text search engine can be used with Perst
in standalone mode - when the Lucene indices are stored in a standard OS file
system. This provides the highest level of performance, but as far as neither
Lucene, nor the file system guarantee the durability and consistency of the file
data in case of a power or system failure, such an incident can cause
corruption of the full text search index and necessitate its rebuild (which can
take a significant amount of time). Continuous provides its own implementation
of the abstract Lucene directory, allowing the storage of Lucene data directly
in a Perst database. As far as Perst BLOBs are optimized for sequential access
to the BLOB data, they are not able to efficiently perform random access
operations used by Lucene and so performance in this case will be worse than in
the case of using the standard Lucene FSDirectory (which stores data in OS
files). But, the main bonus of the Perst Lucene directory provider is that the Lucene
indices becomes transactional - they should survive application and system
faults and any inconsistency between normal and full text search data is not
possible at all.
To create a full text search index for Perst objects using the Continuous package you only needed to mark particular class fields with the FullTextSearchable annotation attribute:
The @Indexable indicates the field is a standard Perst b-tree index (optionally unique).
Pre-requirements of the Continuous package:
It is possible to build the package using either the
Javadoc documentation of the Continuous package can be
found here.
The speed of accessing data from a disk is several times
slower than the speed of accessing data in main memory. That is why caching of
data is the key to increase database performance. Perst uses a pool of pages to
optimize access to the disk. The page pool size can be specified in the
If you think that all your data should fit in the main
memory, you can use the
In some applications (such as for mobile devices) persistence is not needed. But, such Perst container classes
as
There are some constants defined in the
Now, some hints on how to increase Perst performance and reduce the size of used main memory:
But, if you will perform an update of a large number of
objects in one transaction and for each updated object a shadow is created,
then it leads to a significant increase in the database file size. If you
update each object in the database inside one transaction, the database size will
be almost doubled! However, if you perform each update in a separate
transaction, then the size of the database will be almost unchanged (because the
space of allocated shadow objects will be reused in this case). So the best
approach is to perform a commit after 100 or 1000 updates; it will reduce the overhead
of each commit and minimize the database size.
If your persistent object forms a tree or graph where all
objects can be accessed by reference from the root object, then once you load the
root object in the main memory and store a reference to it in some variable, GC
will never be able to collect any instance of a loaded persistent object
(because it will be accessible from the root object). So, when you application
accesses more and more objects from the storage, at some moment of time all of
them will have to be loaded in the main memory. This can cause space
exhaustion. To prevent this situation you should avoid storing references to
container objects which contain references to a large number of members in
variables. This is especially true when storing the root object. In this case,
GC is able to do its work and throw out from the memory, objects which are not
used at this moment (to which there are no references from the local variable).
But, it is important to note that objects accessible through the index by a key
can be normally deallocated by the garbage collector. So, special care is not
needed in this case.
Values are always stored in context of the persistent object referencing
them. Only those fields of the value objects, which are known at runtime, are
stored. If you assigned an object derived from the declared class of the field to
the value field, then, the derived fields will not be saved.
Initialization of transient attributes or resettable attributes should be done via the default constructor, or the
But really, the latter situation is better. It is important to notice that all the Perst collection classes always load their members on demand. It means that once you have a reference, for example, to FieldIndex, only the header of this collection object will be loaded and the members will be loaded on demand. This is true for all other Perst collections:
Please look at the
If you insert keys in the index
in any random order, it is most likely that loaded pages will be located at
random offsets within the database file. Thus, access to the pages requires
positioning of the disk head. The average disk positioning time for modern
disks is about 10 msec. It means that a disk is able to perform about 100 disk
accesses per second. If a database is so large that each operation with a B-Tree
requires loading of all the pages in the path from the root to a leaf page (for
a large number of objects the path will be at least 3 pages), then, the database
will be able to perform about 30 inserts (or searches) per second. Inserting 1
million objects in this worst case scenario will take about 10 hours!
Fortunately, real performance is significantly better (because of page cache
hits, file system cache...).
There are two obvious ways of
improving performance:
Perst provides four approaches
which should be used together to provide the best performance for multimedia
applications:
Now, we shall see how to use all
these four approaches. BLOBs can be created using the
Please notice that space in the
multifile segments in allocated on demand, so it is possible to specify a very
large segment size without the risk of running out of space on the disk. In
this way, we can place BLOBs in separate files. For example, in
The custom allocator is a persistent object implementing the
And now the last thing: we want
to disable caching for BLOB pages. Perst has the "perst.page.pool.lru.limit"
property which can be used to disable caching for all pages in the file which have
an offset larger than the specified limit. The default value of this parameter
is 2^60 (1L << 60). This means that all pages with an offset larger than
2^60 will not be cached in the page pool. Hence, in our example, as far as we
allocate BLOBs in the segment starting at address 2^60, the caching of BLOB
pages will be disabled automatically (there is no need to explicitly set the "perst.page.pool.lru.limit"
property).
The approaches described in this
section may seem to be too complicated. But actually, they are not so
difficult. Please have a look at the
First, create and initialize a database. Then you should use the
Please see the TestBlob example.
First, you should fill the database, by running
If your database contains a large number of small objects, then per-object
space overhead can significantly increase database size. Each Perst object
has 8-byte object header. Also, Perst uses a bitmap memory allocator for
allocating database space. Perst's allocation quantum is specified in
The only way to eliminate this object header overhead is to use the
In addition, to reduce internal fragmentation caused by the memory allocator
quantum, you can try to reduce quantum value (but as a result, the size of
the memory allocation bitmap will increase, and the allocator speed will
decrease).
The most radical way to reduce database size is to use database compression.
Perst supports two approaches to compression: read-only compression
performed by a special utility, and online compression supported in the JDK
1.5 version of Perst, and in Perst.Lite with the RMS storage layer.
Compression is performed at the page level. Database pages are compressed
when they are written to disk, and decompressed when they are loaded from
disk into the page pool. Compression will cause additional CPU load, but
database size can be significantly decreased (the actual compression ratio
depends on the stored data -- the typical reduction is about 3 times).
Despite the increased CPU demands, compression can actually improve
performance, because a larger portion of the database may be cached by the
OS file system cache.
To perform online database compression in the JDK 1.5 version of Perst, use
a special implementation of the database file,
Recommendation: increase the memory limit for the application (-Xmx option in Sun JVM).
Recommendation: decrease pinned object limit.
Recommendation: decrease size of page pool. Please also read the recommendations concerning the choice of proper page pool size in the
Tuning section.
Recommendation: split large transactions into several smaller ones.
You can avoid loading all objects in the linked list by explicitly cutting off recursion. Override the
Recommendation: use Perst collection classes instead of self-made collections.
Recommendation: avoid references to persistent objects from program variables.
Database class.
In this case it is just enough to mark text fields of the objects which should be included in full text
search index using FullTestIndexable annotation.
When object of such type is stored in the database, Perst will concatenate string values of all this fields
( method is used to convert object value to string) and
inserts object in full text index. alternatively object may implement FullTextSearchable
interface and be self responsible for extracting it's text.
Unlike indexed fields of the object included in normal indices, which update should be handled
by user by invocation of excludeFromIndex prior to the update and
includeInIndex after the update,
in case of update of fields included in full text search
index, it is enough to invoke Database.updateFullTextIndex method after the update.
You can find example of using full text index with Database class in
tst/TestDbServer.java.
Perst+Lucene
Although Perst provides its own full text search capabilities,
it is also possible to integrate Perst with the free open source Lucene search engine.
By default, Lucene stores the full text search index in the file system.
But it is also possible to store the Lucene index in a Perst database (using the Perst Blob class).
Lucene works with an inverse index using the Directory interface.
Default implementations of this interface include a file system and in-memory directory.
Perst provides its own implementation of the Lucene Directory interface allowing to store the index directly in a Perst database.
However, a file system is more efficient for random access to large blocks of data compared to Perst Blobs - the type of access that is actually used by Lucene.
Consequently, performance of the file system -based Lucene search engine can be somewhat better than one using Perst storage. Following are results of a comparison of native file system (WinXP/NTFS) and the Perst implementation of Lucene storage. This test indexes 10,000 documents containing 10 fields with 10 randomly generated words:
Operation Lucene+file system Lucene+Perst Build index 56 sec 84 sec Index search 114 sec 118 sec
Storage db = StorageFactory.getInstance().createStorage();
db.open("file.dbs", pagePoolSize);
Directory directory = new PerstDirectory(db);
Please refer to the Lucene programming API documentation for how to use this directory (it can be used like any other Lucene directory implementation, for example the standard FSDirectory
directory. For instance, to build or update the index you should create an index writer:
IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), create);
Please look at the PerstIndexer.java and PerstSearcher.java examples in the same directory. These examples can work with an index stored in a Perst database, as well as in a standard file system (if
-fs option is specified).Persistent.getOid() method
and vice versa - the persistent object can be accessed by its OID using the Storage.getObjectByOID() method.Continuous
Perst is an embedded database system which
is intended to be used for the storage of personal application data. All the Perst
algorithms were designed to provide the best performance in the case when a single
application is accessing a database. But, nowadays standalone applications are being
replaced more and more by services - Web based server applications, for example.
In such cases, providing concurrent access and high scalability will be the
most important priorities.
org.garret.perst.Database wrapper class, which provides some kind of abstraction of the traditional
interface to a relational database system. Unfortunately, it is not possible to
hide all the details from a programmer and some operations like
excluding/including objects in indices should still be explicitly performed by the
programmer.
class Address extends IValue
{
@FullTextSearchable
private String country;
@FullTextSearchable
private String city;
@FullTextSearchable
private String street;
@Indexable
private int zipCode;
...
}
class Company extends CVersion
{
@FullTextSearchable
@Indexable(unique=true)
private String name;
@FullTextSearchable
private Address location;
...
}
When objects of this class are stored in the database (the transaction is committed), the Continuous package will automatically extract the value of fields marked with @FullTextSearchable, parse out the individual words, and build a document with this text and associate the document with this persistent object. You can search objects in the same way as with the standard Lucene API, and the search result will contain references to Perst documents.
compile.bat script in the perst/continuous/src
directory or using build.xml in perst/continuous and the ant tool. The directory perst/continuous/tst contains two
examples using the Continuous package. The bank example can also be used as a test
for measuring the performance and scalability of Perst with the Continuous
package.Tuning
This section contains several hints on how
to adjust Perst parameters and increase database performance.Storage.open
method (by default it is 4 Mb). Usually, increasing the page pool size leads to
better performance. But you will notice the following things:
-Xmx parameter, but if you do not specify it, the maximal size of the page pool
should not be greater than 32 Mb (at least for Sun's JVM).
Storage.INFINITE_PAGE_POOL constant in the Storage.open method. In this case, the page
pool will be dynamically extended when a new page is requested. As a result,
all pages will be cached and present in memory. So, each page is read from the
disk only once. Also, in this case a strong object cache is used instead of a weak
object cache. It means that all the fetched objects are also pinned in memory
and an object is unpacked only once. It is important to notice that the amount
of used main memory will be greater than the database size: all objects will be
present in memory in packed (inside the page containing the object) and in
unpacked (referenced from the object cache) form.Link, Index, FieldIndex, SpatialIndex,
can still be used by the applications. In this case, you can use the NullFile implementation of the
IFile interface together with Storage.INFINITE_PAGE_POOL to create a transient,
in-memory database. Data in this case will never be written on the disk.StorageImpl class which have influence on the initial and maximal database size. If you
want to change them, you will have to rebuild Perst. Below is a detailed
description of these parameters:
Parameter Default value Description dbDefaultInitIndexSize1024
Initial object index size. The object index is increased on demand.
Reallocation of the index is an expensive operation and so, to minimize the number of such reallocations, the
object index size is always doubled. Specifying a larger initial index size
allows a reduction in the number of future reallocations and so, increases
performance a little bit. But it also leads to a larger initial size of the database
file. With the default value of this parameter, the initial database size is
about 50 kb.
dbDefaultExtensionQuantum4Mb
Database extension quantum.
Memory is allocated by scanning a bitmap. If there is no hole large enough, then the database
is extended by this value. Increasing the value of this parameters leads to
less frequent rescanning of the allocation bitmap from the very beginning. It
leads to faster allocation speeds and better locality of reference for the created
objects (because there is a greater chance that they will be allocated
sequentially). On the other hand, it leads to less efficient memory usage.
Reducing the value of this parameter forces the reuse of existing holes in the
memory allocation bitmap.
dbObjectCacheInitSize1319
Size of object cache. Perst needs this
cache to check if an object with a given OID is already present in memory.
This cache uses weak references to allow the garbage collector to do its
work. When some threshold of filling the cache is reached, the cache is
reallocated by doubling its size. Once again, increasing this parameter can
save some number of cache reallocations.
If your database performs a lot of updates of persistent
data, then the main limiting factor is the speed of writing changes to the disk;
especially a synchronous write to the disk performed by commit. If you will do a
commit after each update, then the average speed will be about 10 updates per
second (this estimation is based on the assumption than average disk access
time is about 10 msec and each transaction commit usually requires writing
about 10 pages at random places in the file). But, it is possible to
dramatically increase update performance if you group several updates in one
transaction. Perst creates a shadow of an object when it is updated the first
time inside a transaction. If an object will be updated once in n transactions,
then n shadows will be created. If an object will be updated n times inside one
transaction, then a shadow will be created only once. This explains the advantage
of having one large transaction.Tricks and Tips
onLoad() method or via the default constructor of the class.
onLoad() method that is called when an object is retrieved from the db.
recursiveLoading method
in some classes. You should explicitly load all objects referenced from the object
with disabled recursive loading.Link, Relation, Index, SortedCollection, IPersistentSet. Since persistent objects in a database are usually accessible through collections, in most cases explicit loading of the persistent objects is not needed. But be careful: if in addition to placing all your objects in some index, you also link them, for example, in an L2-List, then fetching a single object will cause loading of all other objects from this collection!
Perst also doesn't remove any object referenced from the removed object. If this
is needed, the programmer should do it himself.
Explicit deletion of objects can cause two problems:
The first
problem is the most critical and can cause the corruption of database data. To
prevent this problem it is strongly recommended to use the Perst garbage
collector instead of using explicit memory deallocation. If this is not
possible (due to performance or some other reasons), it can still be used for
debugging. Since Perst GC is able to detect both problems: it will cause a StorageError(StorageError.INVALID_OID) exception if a reference to the deleted object is found and return a
non zero number of collected objects if there is garbage in the database. beginThreadTransaction and endThreadTransaction
is not enough for proper concurrent work with persistent objects. The following alternatives are possible:
Please notice
that in alternatives 1-3 only one thread is accessing the database at each
moment of time; so, it is not correct to say it is concurrent execution. But it
doesn't mean that with approach 4 you will get the best performance. This is because
of the locking overhead and alternative B-Tree implementation which is less
efficient than the original implementation for very large databases. Also, approach
4 is the most difficult to program because setting proper locks is the responsibility
of the programmer and incorrect locking can cause synchronization problems:
race conditions or deadlocks (two or more threads mutually block each other).EXCLUSIVE_TRANSACTION mode. This approach is similar to 2), but Perst will provide
exclusion of threads itself.
SERIALIZABLE_TRANSACTION mode + object level locking + alternative B-Tree implementation.
tst/TestServer.java example which illustrates how to use the per-thread transactions
and locking when a database is concurrently accessed from multiple threads.
Arrays.sort method. If the number of loaded objects is too large to fit in the memory, I recommend you load some number of objects which fit in the memory, sort them and then insert them in the index. Then, load the next portion of objects, sort them, insert them in the index and so on.
Storage.createBlob method. BLOBs allow incremental writing and retrieving of BLOB
content. To create a multifile, you should create a multifile description file describing its segments. Each line of this file should contain the path
to the physical file and the size of this segment (in kilobytes). Size of the segment
should be aligned on the database page boundary (it should contain an integer
number of pages). For the last segment, the size can be skipped. The file can
look something like this (tst/testalloc.mfd):
"testalloc1.dbs" 1125899906842624
"testalloc2.dbs"
To open a multifile
it is necessary to pass the Storage.open method path to the
multifile description file prepended by the '@' character.testalloc.mfd
the multifile description shown above the size of the first segment,
is set to 2^60 bytes. The first segment will be used to store all the database
objects except BLOBs. Since 2^60 is very large number, we should not be afraid
that at some point in time, the space in the first segment will get exhausted
(space on the disk gets exhausted earlier).org.garret.perst.CustomAllocator interface. Programmers can provide their own implementation of this
interface or create a bitmap allocator using the
Storage.createBitmapAllocator method. This method takes four parameters: size of the allocation
quantum, base address of the allocator, extension quantum, and the limit of
allocated space. By setting the allocator's base address equal to the base
address of the multifile segment ((1L << 60) in case of above multifile
definition). Then, it is necessary to register the allocator. This should be
done using the Storage.registerCustomAllocator method,
associating this allocator with the particular class. Space in the storage,
for all instances belonging to this class or derived from it,
will be allocated using this allocator. If we want to separate
BLOBs from other data we should bind this allocator with the BLOB class.tst/TestAlloc.java example which illustrates how it is possible to store files in a Perst
database. You will see that not a very big effort is needed to efficiently
store BLOBs.CompressedFile which uses the ZLIB compression library. This file can be used
in the read-only mode, so, the database should be prepared previously.org.garret.perst.CompressDatabase
utility to compress the database file. This is a command line
utility. The first mandatory parameter specifies the path to the database file
and the second optional parameter specifies the compression level. This utility
creates a compressed file in the same directory where the original file is
located but with the ".dbz" extension. This utility splits files into
128 kb blocks and compresses each block separately. You can open a compressed
file using a standard zip utility. Inside, you will find several entries:
000000000000, 000000131072, 000000262144 . .
. . Each entry corresponds to a 128 kb block of the
original file. If you extract and concatenate all of them, you will get the original
database file. When the compressed file is ready, you can open it using org.garret.perst.CompressedFile.TestBlob. Then, you
should compress the database using the CompressDatabase
utility. After this you can run TestBlob zip which will extract data
from the compressed database. In this example, the original database file size is
315392 bytes and the size of the compressed database is 24436 bytes.Storage.open method which
takes a cipher key:
public void open(String filePath, long pagePoolSize, String cipherKey);
Please note that if in addition to using data encryption, you also want to
use database compression, then you should use a specialized implementation
of the database file instead of using the Storage.open method
with the cipher key parameter. This is described in the following
section.StorageImpl.dbAllocationQuantumBits and is 32 bytes by default.
Therefore, if your object contains only one field of type int (4 bytes),
then the size of the object, with object header, is 4+8=12 bytes. But the
bitmap allocator allocates 32 bytes for this object on disk. So instead of
just 4 bytes, Perst will use 32 bytes in the storage.TimeSeries class, so that small fixed-size objects will be
allocated in blocks containing large numbers of elements. But a drawback to
this method is that it is impossible to access time series elements directly
- you can only locate an element or range of elements by timestamp (any
monotonically increasing key).CompressedReadWriteFile:
Storage db = StorageFactory.getInstance().createStorage();
db.open(new CompressedReadWriteFile("mydatabase.dbs", CIPHER_KEY), PAGE_POOL_SIZE);
In Perst.Lite, use the perst-rms.jar library and
CompressedFile class:
db = StorageFactory.getInstance().createStorage();
db.open(new CompressedFile("mydatabase.dbz", new JZlibEncryptedCompressor(cipherKey)), PAGE_POOL_SIZE);
Please note that in the examples above, a cipher key is also specified.
Although database compression and database encryption are independent of one
another, and Perst is able to provide database encryption without database
compression, compression should take place before encryption.
Otherwise, compression will have almost no effect on database size, since
encryption eliminates regularities in the data stream. This is why
compressed database files also provides data encryption (if you do not want
to use encryption, just pass null as the cipher key or use
JZlibCompressor instead of JZlibEncryptedCompressor).
OutOfMemoryException exception
in such cases by suspending user threads and making it possible for finalization and garbage
collection threads to complete their work and free enough memory. But with some JVM implementations,
in very rare situations under heavy load, such a situation is unfortunately possible. This can only happen in the standard version of Perst, not in Perst.Lite,
since finalization is not supported in J2ME (Java ME)."perst.object.cache.init.size" property ("perst.object.cache.pin.count" in Perst.Lite). The default value is 1319 for the standard Perst version and 100 for Perst.Lite.Storage.open method. The default value is 4Mb for standard Perst
and 64kb for Perst.Lite.
class MyClass {
MyClass next;
...
}
When such an object is loaded in memory, it contains a reference to the next object which will be recursively proceeded and also loaded into memory. And the next one, and the next one... As the result, the entire chain of objects is loaded into memory. If a single reference to the first item in the list is kept in some program variable, none of the objects can be deallocated by the GC, since all of them are "strongly" referenced.Persistent.recursiveLoading method in MyClass and handle loading of instances of this class explicitly using the Persistent.load method. However, if the reference to the first item in the list is pinned, this approach will not help solve the problem of all loaded objects being pinned in memory. Generally speaking, the solution is to use direct references only to implement object associations in small groups, and not use references with large data collections. In the latter case, you should use the Perst collection classes which load collection members on demand and do not keep strong references to objects. That allows the GC to deallocate them.