databene

 
  • Increase font size
  • Default font size
  • Decrease font size

benerator file format

The benerator configuration file is XML based. An XML schema is provided. The document root is a setup element:

<?xml version="1.0" encoding="iso-8859-1"?>
<setup xmlns="http://databene.org/benerator-0.7.0.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://databene.org/benerator-0.7.0.xsd benerator-0.7.0.xsd">
<setup>
<!-- content here -->
</setup>

benerator files should end with the suffix .ben.xml.

properties

You can define global properties:

        <property name="my_name" value="Volker" />

or import several of them from a properties file:

        <include uri="my.properties" />

javabeans and the context

You can instantiate JavaBeans by an intuitive syntax like this:

  <bean id="db" class="com.my.SpecialBean">
<property name="user" value="benerator"/>
<property name="password" value="benerator"/>
</bean>

The class attribute denotes which JavaBean class to instantiate (by the default constructor). The enclosed property tags cause the JavaBean's properties to be set to appropriate values. Benerator converts common types automatically. If not, you may define a custom ConverterManager setup (see databene-commons). Date and time formatting is supported according to ISO 8601 Conventions.

Objects are made available by exposing them in a context. The id attribute defines the name with which an object can be found, e.g. for a 'source' or 'ref' attribute of another element's setup.

So the example above creates an instance of a DBSystem JavaBean class, setting its properties to values for connecting a database. The object is retrievable by the context with the id 'db'.

Note: The class DBSystem implements the interface 'System' which provides (among other features) meta information about the entities (tables) contained in the database.

You can create references to other objects declared before by a 'ref'-attribute in the bean declaration. The following example shows this for a task setup, but this can be applied to beans and consumers as well.

Note: You may implement the System interface for connecting to other system types like SAP or Siebel systems.

JavaBeans may refer each other and may have collection or attribute properties as shown in the following example:

<bean id="csv" class="org.databene.platform.csv.CSVEntityExporter">
     <property name="uri" value="customers.csv"/>
     <property name="properties" value="salutation,first_name,last_name"/>
</bean>

<bean id="proxy" class="shop.MyProxy">
     <property name="target" ref="csv"/>
</bean>

<bean id="log-csv" class="org.databene.model.consumer.ConsumerChain">
    <property name="components">
        <bean class="org.databene.model.consumer.LoggingConsumer"/>
        <idref bean="proxy"/>
    </property>
</bean>

databases can be defined using a <database> element:

<database id="db"
    url="jdbc:mysql://localhost/benerator"
    driver="com.mysql.jdbc.Driver"
    schema="benerator"
    user="benerator" password="benerator" batch="true" />

 

tasks

<run-task class="org.databene.platform.db.adapter.RunSqlScriptTask">
     <property name="uri" value="shop/create_tables.mysql.sql"/>
     <property name="db" ref="db"/>
</run-task>

 

Besides the general 'bean' elements, other elements may be created by using element name and additional attributes as processing information.

The example above tells to create a JavaBean of class 'RunSqlScriptTask' with a uri 'shop/create_tables.mysql.sql' and its property 'db' refering the JavaBean "db" in the context. Finally it is executed.

You may define custom tasks to suit your needs, e.g. for performing health checks, by implementing the interface 'org.databene.task.Task'. By its interface a Tasks demarks if it is thread-safe or at least parallelizable.

The element run-task also supports the attributes

  • count: the total number of times the Task is executed (defaults to 1)
  • pagesize: the number of invocations to execute 'en bloque' (defaults to 1)
  • threads: the number of threads with which to execute the Task (defaults to 1)

importing entities

Entities can be imported from 'system's, files or other generators. A typical application is to (re)use a DBUnit setup file from your (hopefully existing ;-) unit tests:

<!-- import basic setup from a DBUnit file -->
<iterate source="shop/shop.dbunit.xml" consumer="db"/>

For importing DbUnit files, follow the naming conventions using the suffix .dbunit.xml.

Each created entity is forwarded to one or more consumers, which usually will persist objects in a file or system, but might also be used to post-process created entities. The specified object needs to implement the Consumer or the system interface. When specifying a system here, it will be used to store the entities. File exporters (for CSV and Flat Files) implement the Consumer interface.

custom importers

New import formats can be supported by implementing the EntitySource interface with a JavaBean implementation, instantiating it as bean and refering it by its id with a 'source' attribute, e.g.

<bean id="products_flat" class="org.databene.platform.flat.FlatFileEntitySource">
     <property name="uri" value="shop/products.import.flat"/>
     <property name="entity" value="product"/>
     <property name="properties" value="ean_code[13],name[30],category_id[9],price[8r0],manufacturer[30]"/>
</bean>

<iterate name="product" source="products_flat">
     <consumer class="org.databene.model.consumer.LoggingConsumer"/>
</iterate>

 

 

 

chaining generators

Generators may be chained, composed, or reused in different contexts. You can do so by instantiating a generator as JavaBean and referring it in properties of other JavaBean-instantiated generators or specifying it as 'source' attribute like an importer.

<!-- creates a text generator -->
<bean id="textGen" class="org.databene.benerator.primitive.regex.RegexStringGenerator">
     <property name="pattern" value="([a-z]{3,8}[ ])*[a-z]{3,8}\."/>
</bean>

<!-- wraps the text generator and creates messages -->
<generate name="message" count="10">
    <attribute name="text" source="textGen"
        converter="org.databene.model.converter.MessageConverter" pattern="Message: ''{0}''"/>
    <consumer class="org.databene.model.consumer.LoggingConsumer"/>
</generate>

 

 

creating random entities

Entities can be generated without any input files: Benerator provides a rich set of Generator implementations. When using generate, the registered systems (e.g. the database) are queried for meta data. Benerator interprets the meta data and automatically sets up generators that match the systems' constraints, lik column length, referenced entities and more. By default, associations are treated as many-to-one associations.

    <!-- create products of random attribs & category -->
<generate name="db_product" count="1000" pagesize="100">
<consumer ref="db"/>
</generate>

Entities are generated as long as each attribute generator is available and limited by the number specified in the 'count' attribute. The 'pagesize' defines the number of creations after which a flush() is applied to all consumers (for a database system this is mapped to a commit).

 

nesting entities

Entities can form composition structures, which are generated best by recursive generate structures.

TODO: example

 

exporting generated data to data files

You will need to reuse some of the generated data for setting up (load) test clients. You can simply export data by an appropriate consumer:

    <!-- create products of random attribs & category -->
<generate name="db_product" count="1000" pagesize="100">
<consumer ref="db"/>
<consumer class="org.databene.platform.fixedwidth.FixedWidthEntityExporter">
<property name="uri" value="products.flat"/>
<property name="properties" value="ean_code[13],name[30l],price[10r0]"/>
</consumer>
</generate>

 

imposing one-field business constraints

Simple constraints, e.g. formats can be assured by defining an appropriate Generator or regular expression, e.g.

    <!-- create products of random attribs & category -->
<generate name="db_product" count="1000" pagesize="100">
<attribute name="ean_code" generator="org.databene.domain.product.EANGenerator"/>
<attribute name="name" pattern="[A-Z][A-Z]{5,12}"/>
<consumer ref="db"/>
</generate>

 

imposing multi-field-constraints

For supporting multi-field-constraints, you can provide a Generator (with a variable element) that creates entities, JavaBeans or Maps. This may be e.g. a random generator or an importing generator. On each generation run, an instance is generated and made available to the other sub generators. They can use the entity or sub elements by a source path attribute:

    <generate name="db_customer">
<variable name="person" generator="org.databene.domain.person.PersonGenerator" dataset="DE"/>
<attribute name="salutation" source="person.salutation"/>
<attribute name="first_name" source="person.givenName"/>
<attribute name="last_name" source="person.familyName"/>
<consumer ref="db"/>
</generate>

The source path may be composed of property names, map keys and entity features, separated by a dot.

 

Using databases

You can easily define a database:

    <database id="db" url="jdbc:hsqldb:hsql://localhost" driver="org.hsqldb.jdbcDriver" user="sa" batch="false"/>

A database must have an id by which it can be referenced later.
For starting a project, it is better to have batch="false. In this mode, database errors are easier to track.
 
SQL code can be executed, e.g. from a file:
    <execute uri="drop-tables" target="db" onError="warn"/>
 
 or inline:
 
    <execute target="db" type="sql" onError="warn">
        CREATE TABLE db_role (
          id   int         generated by default as identity (start with 1) NOT NULL,
          name varchar(16) NOT NULL,
          PRIMARY KEY (id)
        );
    </execute>
 
onError determines on which log level errors are reported, on 'fatal', benerator execution is stopped. You can use: ignore, trace, debug, info, warn, error, fatal
uris are resolved relative to the benerator file that declares them (as common in HTML). If the file is not found locally, it is searched relative to the current working drectory.
 

default column settings

Usually most tables have common column names, e.g. for ids or audit data. You can specify default settings by column name:
 
    <defaultComponents>
        <id name="ID" type="long" generator="IncrementalIdGenerator"/>
        <attribute name="SNAPSHOT_NUMMER" nullQuota="1"/>
        <attribute name="VERSION" values="1"/>
        <attribute name="CREATEDDATE" generator="org.databene.benerator.primitive.datetime.CurrentDateGenerator"/>
        <attribute name="CREATEDBY" script="benutzer1"/>
        <attribute name="LASTUPDATED" generator="org.databene.benerator.primitive.datetime.CurrentDateGenerator"/>
        <attribute name="LASTUPDATEDBY" script="benutzer1"/>
    </defaultComponents>
 
If a table has a column which is not configured in the benerator descriptor but as defaultComponent, benerator uses the defaultComponent config. If no defaultComponent config exists, benerator falls back to a useful standard setting.
 

creating entities

With benerators many useful defaults, you have a minimum effort on initial configuration:

    <generate name="db_role" count="10" consumer="db" />
    <generate name="db_user" count="100" consumer="db" />

Id generation defaults to an increment stretegy and for all other column useful defaults are chosen.

 

resolving relations

If you run the example above, you will get a strange-looking result: You get only 10 db_users though you configured 100.

But this is caused by one of benerator's defaults: benerator does not know, if the relation user-role is one-to-one or many-to-one. So benerator decides to use one-to-one for avoiding problems.

If you want a many-to-one relationship you need to specifiy its characteristics, e.g. by a distribution:

    <generate name="db_role" count="10" consumer="db" />
    <generate name="db_user" count="100" consumer="db">
        <reference name="role_fk" targetType="db_
role" source="db" distribution="random"/>
    </generate>

This will cause creation of 100 users which are evenly distributed over the roles.

You can as well configure configuration of each role type by itself, e.g.

    <generate name="db_role" count="10" consumer="db" />

    <generate name="db_user" count="5" consumer="db">
        <attribute name="role_fk" values
="admin"/>
    </generate>

    <generate name="db_user" count="95" consumer="db">
        <attribute name="role_fk" values="customer"/>
    </generate>

Though role_fk is a reference, you can use all features, available for <attribute> configuration.

 

Scripting

As of benerator 0.5.5 there is an experimental support for binding scripting languages.

The invocation syntax is as described for SQL invocation and inlining.

    <execute type="js">
        importPackage(org.databene.model.data);
        print('Hello ' + benerator.getContext().get('user').get('name') + '!');
        print('DB-URL' + db.getUrl());
        var alice = new Entity('TT', 'id', '2', 'name', 'Alice');
        db.store(alice);
        db.flush();
    </execute>

You can bind a language of choice by using the mechanisms of JSR 223: Scripting for the Java Platform.

With Java 6 for Windows, a JavaScript implementation is shipped. For all other platforms and languages you need to configure language support.

There is no connection to benerator internals, yet. Since the only scripting used so far was FreeMarker, I will need to resolve some differeing concepts and make some major changes with release 0.6.0

 

data types

The following data types are supported:

benerator type JDBC type name JDBC
type
value
Java type
byte Types.TINYINT
Types.BIT
-6
-7
java.lang.Byte
short Types.SMALLINT 5 java.lang.Short
int Types.INTEGER 4 java.lang.Integer
big_integer Types.BIGINT -5 java.math.BigInteger
float Types.FLOAT 6 java.lang.Float
double Types.DOUBLE
Types.NUMERIC
Types.REAL
8
2
7
java.lang.Double
big_decimal Types.DECIMAL 3 java.math.BigDecimal
boolean Types.BOOLEAN 16 java.lang.Boolean
char Types.CHAR 1 java.lang.Character
date Types.DATE
Types.TIME
91
92
java.util.Date
timestamp Types.TIMESTAMP 93 java.sql.Timestamp
string Types.VARCHAR
Types.LONGVARCHAR
Types.CLOB
12
-1
2005
java.lang.String
object (TODO) Types.JAVA_OBJECT 2000 java.lang.Object
binary Types.BINARY
Types.VARBINARY
Types.VARBINARY
Types.BLOB
-2
-3
-4
2004
byte[]
(specific) Types.OTHER 1111 (specific)
n/a Types.DATALINK
Types.NULL
Types.DISTINCT
Types.STRUCT
Types.ARRAY
Types.REF
70
0
2001
2002
2003
2006
n/a

querying information from a system

Arbitrary information may be queried from a system by a 'selector' attribute, which is system-dependent. For a database SQL is used:

    <generate name="db_order" count="30" pagesize="100">
<attribute name="id" mode="ignored"/>
<attribute name="customer_id" source="db" selector="select id from db_customer" cyclic="true"/>
<consumer ref="db"/>
</generate>

The result set of a selector might be quite large, so different strategies (for wrapping any other generator's output) are supported:

  • distribution: Maps to the name of a Sequence or WeightFunction class. For this, the complete result set is loaded into ram. A Sequence should not be applied to result sets of more than 100.000 elements, a WeightFunction should be restricted to at most 10.000 elements.
  • proxy="skip" or proxy="repeat" for iterating sequentially through the set. 'proxy-param1' and 'proxy-param2' may be used to specify minimum and maximum of repetitions or skipped elements. If cyclic="true", the result set will be re-iterated from the beginning when it has reached the end.
        <generate name="db_order_item" count="100" pagesize="100">
    <attribute name="id" mode="ignored"/>
    <attribute name="number_of_items" min="1" max="27" distribution="cumulated"/>
    <attribute name="order_id" source="db" selector="select id from db_order" cyclic="true"/>
    <attribute name="product_id" source="db" selector="select ean_code from db_product" distribution="random"/>
    <consumer ref="db"/>
    </generate>
You can use script expressions in your selectors, e.g.
    selector="{select ean_code from db_product where country='${country}'}"

The script is resolved immediately before the first generation and then reused.

If you need dynamic queries, that are re-evaluated, you can specify them with double brackets:

selector="{{select ean_code from db_product where country='${shop.country}'}}"

Example:

 

    <generate name="shop" count="10">
<attribute name="country" values="DE,AT,CH"/>
<generate name="product" count="100" consumer="db">
<attribute name="ean_code" source="db" selector="selector="{{select ean_code from db_product where country='${shop.country}'}}"/>
</generate>
</generate>

entity definition

id definition

attribute definition

 

all supported generator attributes

name
name of the feature to generate
type
type of the feature to generate
nullable
tells if the feature may be null
mode
controls the processing mode: (normal|ignored|secret)
pattern
uses a regular expression for String creation or date format pattern for parsing Dates.
generator
uses a Generator instance for data creation
values
provides a comma-separated list of values to choose from
nullQuota
the quota of null values to create
converter
the class name of a Converter to apply to the generated objects
dataset
a (nestable) set to create data for, e.g. dataset="US" for the United States
locale
a locale to create data for, e.g. locale="de" 
offset
the number of elements to skip at the top of a generated/iterated product sequence
unique
wether to assure uniqueness, e.g. unique="true". Since this needs to keep every instance in memory, use is restricted to 100.000 elelments. For larger numbers you should use Sequence-based algorithms.
source
A system, EntityIterator or file to import data from.
selector
A system-dependent selector to query for data.
trueQuota
the quota of true values created by a Boolean Generator.
min
the minimum Number or Date to generate
max
the maximum Number or Date to generate
precision
the resolution of Numbers or Dates to generate
distribution
the distribution to use for Number or Date generation. This may be a Sequence name or a WeightFunction class name.
minLength
the minimum length of the Strings that are generated
maxLength
the maximum length of the Strings that are generated
cyclic
auto-resets the generator after it has gone unavailable
proxy
wraps a generator with a proxy (skip|repeat), which skips or repeats products
 

reference definition

 

Scripting

Scripts are supported in

  • benerator setup files
  • properties files
  • DbUnit XML files
  • CSV files
  • Flat files

    A script is denoted by curly braces, e.g. '{Hi, I am ${my_name}}'. This syntax will use the default script engine for rendering the text as, e.g. 'Hi, I am Volker'.

    The default script engine is set by the property benerator.defaultScript .

    If you need to support different script engines (e.g. while combining files from different sources), you can differ them by prepending the scripting engine id, e.g. '{ftl:Hi, I am ${my_name}}' or '{Vel:Hi, I am ${my_name}}'

    Scripts in the benerator setup are evaluated while parsing. If you need to dynamically generate script text at runtime, use a attribute.script field:

     <attribute name="total_price" script="{${(product[1] * db_order_item.number_of_items)?c}}" />

    With scripts you can access

  • environment variables
  • JVM parameters
  • any JavaBean globally declared in the benerator setup
  • the last generated entity of each type
  • variable values

    Variable names in scripting may not contain points - a point always implies navigation, e.g. person.familyName navigates from the person object to the familyName attribute/property/key.


staging

Combining scripting and property files, you get a staging mechanism, which is demonstarted in the shop demo. Check the file demo/shop/shop.ben.xml in your benerator installation. It uses staging for populating all seven supported databases with the same benerator setup file, moving database specific code to small properties files.

When invoking benerator with a -Dstage=development JVM parameter, you can make your import

include uri="{demo/shop/shop.${stage}.properties}" /

 

template support

You can use DbUnit import files for replicating entity graph structures many times on each generated object. Say, for each customer in a tested online shop, a default order structure should be created. You would then define the order structure in a DbUnit file

 <dataset>
<db_order_item order_id="{${db_order.id}}" number_of_items="2" product_ean_code="8076800195057" total_price="2.40" />
<db_order_item order_id="{${db_order.id}}" number_of_items="1" product_ean_code="8006550301040" total_price="8.70" />
</dataset>

and then create an order for each customer that imports its sub structure from the DbUnit file:

        <generate name="db_order" consumer="db">
<id name="id" generator="IncrementalIdGenerator" />
<attribute name="customer_id" source="db" selector="select id from db_customer" />
<iterate name="db_order_item" source="demo/shop/default_order.dbunit.xml" consumer="db">
<id name="id" generator="IncrementalIdGenerator" />
</iterate>
</generate>

Of course, you have to care for appropriate ids yourself.