benerator file format
The benerator configuration file is XML based. An XML schema is provided. The document root is a setup element:
<?xml version="1.0" encoding="iso-8859-1"?>
<setup xmlns="http://databene.org/benerator-0.7.0.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://databene.org/benerator-0.7.0.xsd benerator-0.7.0.xsd">
<setup>
<!-- content here -->
</setup>
benerator files should end with the suffix .ben.xml.
properties
You can define global properties:
<property name="my_name" value="Volker" />
or import several of them from a properties file:
<include uri="my.properties" />
javabeans and the context
You can instantiate JavaBeans by an intuitive syntax like this:
<bean id="db" class="com.my.SpecialBean">
<property name="user" value="benerator"/>
<property name="password" value="benerator"/>
</bean>
The class attribute denotes which JavaBean class to instantiate (by the default constructor). The enclosed property tags cause the JavaBean's properties to be set to appropriate values. Benerator converts common types automatically. If not, you may define a custom ConverterManager setup (see databene-commons). Date and time formatting is supported according to ISO 8601 Conventions.
Objects are made available by exposing them in a context. The id attribute defines the name with which an object can be found, e.g. for a 'source' or 'ref' attribute of another element's setup.
So the example above creates an instance of a DBSystem JavaBean class, setting its properties to values for connecting a database. The object is retrievable by the context with the id 'db'.
Note: The class DBSystem implements the interface 'System' which provides (among other features) meta information about the entities (tables) contained in the database.
You can create references to other objects declared before by a 'ref'-attribute in the bean declaration. The following example shows this for a task setup, but this can be applied to beans and consumers as well.
Note: You may implement the System interface for connecting to other system types like SAP or Siebel systems.
JavaBeans may refer each other and may have collection or attribute properties as shown in the following example:
<bean id="csv" class="org.databene.platform.csv.CSVEntityExporter"> |
databases can be defined using a <database> element:
<database id="db" |
tasks
| <run-task class="org.databene.platform.db.adapter.RunSqlScriptTask"> <property name="uri" value="shop/create_tables.mysql.sql"/> <property name="db" ref="db"/> </run-task> |
Besides the general 'bean' elements, other elements may be created by using element name and additional attributes as processing information.
The example above tells to create a JavaBean of class 'RunSqlScriptTask' with a uri 'shop/create_tables.mysql.sql' and its property 'db' refering the JavaBean "db" in the context. Finally it is executed.
You may define custom tasks to suit your needs, e.g. for performing health checks, by implementing the interface 'org.databene.task.Task'. By its interface a Tasks demarks if it is thread-safe or at least parallelizable.
The element run-task also supports the attributes
- count: the total number of times the Task is executed (defaults to 1)
- pagesize: the number of invocations to execute 'en bloque' (defaults to 1)
- threads: the number of threads with which to execute the Task (defaults to 1)
importing entities
Entities can be imported from 'system's, files or other generators. A typical application is to (re)use a DBUnit setup file from your (hopefully existing ;-) unit tests:
<!-- import basic setup from a DBUnit file --> |
For importing DbUnit files, follow the naming conventions using the suffix .dbunit.xml.
Each created entity is forwarded to one or more consumers, which usually will persist objects in a file or system, but might also be used to post-process created entities. The specified object needs to implement the Consumer or the system interface. When specifying a system here, it will be used to store the entities. File exporters (for CSV and Flat Files) implement the Consumer interface.
custom importers
New import formats can be supported by implementing the EntitySource interface with a JavaBean implementation, instantiating it as bean and refering it by its id with a 'source' attribute, e.g.
<bean id="products_flat" class="org.databene.platform.flat.FlatFileEntitySource"> |
chaining generators
Generators may be chained, composed, or reused in different contexts. You can do so by instantiating a generator as JavaBean and referring it in properties of other JavaBean-instantiated generators or specifying it as 'source' attribute like an importer.
| <!-- creates a text generator --> |
creating random entities
Entities can be generated without any input files: Benerator provides a rich set of Generator implementations. When using generate, the registered systems (e.g. the database) are queried for meta data. Benerator interprets the meta data and automatically sets up generators that match the systems' constraints, lik column length, referenced entities and more. By default, associations are treated as many-to-one associations.
<!-- create products of random attribs & category -->
<generate name="db_product" count="1000" pagesize="100">
<consumer ref="db"/>
</generate>
Entities are generated as long as each attribute generator is available and limited by the number specified in the 'count' attribute. The 'pagesize' defines the number of creations after which a flush() is applied to all consumers (for a database system this is mapped to a commit).
nesting entities
Entities can form composition structures, which are generated best by recursive generate structures.
TODO: example
exporting generated data to data files
You will need to reuse some of the generated data for setting up (load) test clients. You can simply export data by an appropriate consumer:
<!-- create products of random attribs & category -->
<generate name="db_product" count="1000" pagesize="100">
<consumer ref="db"/>
<consumer class="org.databene.platform.fixedwidth.FixedWidthEntityExporter">
<property name="uri" value="products.flat"/>
<property name="properties" value="ean_code[13],name[30l],price[10r0]"/>
</consumer>
</generate>
imposing one-field business constraints
Simple constraints, e.g. formats can be assured by defining an appropriate Generator or regular expression, e.g.
<!-- create products of random attribs & category -->
<generate name="db_product" count="1000" pagesize="100">
<attribute name="ean_code" generator="org.databene.domain.product.EANGenerator"/>
<attribute name="name" pattern="[A-Z][A-Z]{5,12}"/>
<consumer ref="db"/>
</generate>
imposing multi-field-constraints
For supporting multi-field-constraints, you can provide a Generator (with a variable element) that creates entities, JavaBeans or Maps. This may be e.g. a random generator or an importing generator. On each generation run, an instance is generated and made available to the other sub generators. They can use the entity or sub elements by a source path attribute:
<generate name="db_customer">
<variable name="person" generator="org.databene.domain.person.PersonGenerator" dataset="DE"/>
<attribute name="salutation" source="person.salutation"/>
<attribute name="first_name" source="person.givenName"/>
<attribute name="last_name" source="person.familyName"/>
<consumer ref="db"/>
</generate>
The source path may be composed of property names, map keys and entity features, separated by a dot.
Using databases
You can easily define a database:
<database id="db" url="jdbc:hsqldb:hsql://localhost" driver="org.hsqldb.jdbcDriver" user="sa" batch="false"/>
CREATE TABLE db_role (
id int generated by default as identity (start with 1) NOT NULL,
name varchar(16) NOT NULL,
PRIMARY KEY (id)
);
</execute>
default column settings
<id name="ID" type="long" generator="IncrementalIdGenerator"/>
<attribute name="SNAPSHOT_NUMMER" nullQuota="1"/>
<attribute name="VERSION" values="1"/>
<attribute name="CREATEDDATE" generator="org.databene.benerator.primitive.datetime.CurrentDateGenerator"/>
<attribute name="CREATEDBY" script="benutzer1"/>
<attribute name="LASTUPDATED" generator="org.databene.benerator.primitive.datetime.CurrentDateGenerator"/>
<attribute name="LASTUPDATEDBY" script="benutzer1"/>
</defaultComponents>
creating entities
With benerators many useful defaults, you have a minimum effort on initial configuration:
<generate name="db_role" count="10" consumer="db" />
<generate name="db_user" count="100" consumer="db" />
Id generation defaults to an increment stretegy and for all other column useful defaults are chosen.
resolving relations
If you run the example above, you will get a strange-looking result: You get only 10 db_users though you configured 100.
But this is caused by one of benerator's defaults: benerator does not know, if the relation user-role is one-to-one or many-to-one. So benerator decides to use one-to-one for avoiding problems.
If you want a many-to-one relationship you need to specifiy its characteristics, e.g. by a distribution:
<generate name="db_role" count="10" consumer="db" />
<generate name="db_user" count="100" consumer="db">
<reference name="role_fk" targetType="db_role" source="db" distribution="random"/>
</generate>
This will cause creation of 100 users which are evenly distributed over the roles.
You can as well configure configuration of each role type by itself, e.g.
<generate name="db_role" count="10" consumer="db" />
<generate name="db_user" count="5" consumer="db">
<attribute name="role_fk" values="admin"/>
</generate>
<generate name="db_user" count="95" consumer="db">
<attribute name="role_fk" values="customer"/>
</generate>
Though role_fk is a reference, you can use all features, available for <attribute> configuration.
Scripting
As of benerator 0.5.5 there is an experimental support for binding scripting languages.
The invocation syntax is as described for SQL invocation and inlining.
<execute type="js">
importPackage(org.databene.model.data);
print('Hello ' + benerator.getContext().get('user').get('name') + '!');
print('DB-URL' + db.getUrl());
var alice = new Entity('TT', 'id', '2', 'name', 'Alice');
db.store(alice);
db.flush();
</execute>
You can bind a language of choice by using the mechanisms of JSR 223: Scripting for the Java Platform.
With Java 6 for Windows, a JavaScript implementation is shipped. For all other platforms and languages you need to configure language support.
There is no connection to benerator internals, yet. Since the only scripting used so far was FreeMarker, I will need to resolve some differeing concepts and make some major changes with release 0.6.0
data types
The following data types are supported:
| benerator type | JDBC type name | JDBC type value | Java type |
| byte | Types.TINYINT Types.BIT | -6 -7 | java.lang.Byte |
| short | Types.SMALLINT | 5 | java.lang.Short |
| int | Types.INTEGER | 4 | java.lang.Integer |
| big_integer | Types.BIGINT | -5 | java.math.BigInteger |
| float | Types.FLOAT | 6 | java.lang.Float |
| double | Types.DOUBLE Types.NUMERIC Types.REAL | 8 2 7 | java.lang.Double |
| big_decimal | Types.DECIMAL | 3 | java.math.BigDecimal |
| boolean | Types.BOOLEAN | 16 | java.lang.Boolean |
| char | Types.CHAR | 1 | java.lang.Character |
| date | Types.DATE Types.TIME | 91 92 | java.util.Date |
| timestamp | Types.TIMESTAMP | 93 | java.sql.Timestamp |
| string | Types.VARCHAR Types.LONGVARCHAR Types.CLOB | 12 -1 2005 | java.lang.String |
| object (TODO) | Types.JAVA_OBJECT | 2000 | java.lang.Object |
| binary | Types.BINARY Types.VARBINARY Types.VARBINARY Types.BLOB | -2 -3 -4 2004 | byte[] |
| (specific) | Types.OTHER | 1111 | (specific) |
| n/a | Types.DATALINK Types.NULL Types.DISTINCT Types.STRUCT Types.ARRAY Types.REF | 70 0 2001 2002 2003 2006 | n/a |
querying information from a system
Arbitrary information may be queried from a system by a 'selector' attribute, which is system-dependent. For a database SQL is used:
<generate name="db_order" count="30" pagesize="100">
<attribute name="id" mode="ignored"/>
<attribute name="customer_id" source="db" selector="select id from db_customer" cyclic="true"/>
<consumer ref="db"/>
</generate>
The result set of a selector might be quite large, so different strategies (for wrapping any other generator's output) are supported:
- distribution: Maps to the name of a Sequence or WeightFunction class. For this, the complete result set is loaded into ram. A Sequence should not be applied to result sets of more than 100.000 elements, a WeightFunction should be restricted to at most 10.000 elements.
- proxy="skip" or proxy="repeat" for iterating sequentially through the set. 'proxy-param1' and 'proxy-param2' may be used to specify minimum and maximum of repetitions or skipped elements. If cyclic="true", the result set will be re-iterated from the beginning when it has reached the end.
<generate name="db_order_item" count="100" pagesize="100">
<attribute name="id" mode="ignored"/>
<attribute name="number_of_items" min="1" max="27" distribution="cumulated"/>
<attribute name="order_id" source="db" selector="select id from db_order" cyclic="true"/>
<attribute name="product_id" source="db" selector="select ean_code from db_product" distribution="random"/>
<consumer ref="db"/>
</generate>
selector="{select ean_code from db_product where country='${country}'}"The script is resolved immediately before the first generation and then reused.
If you need dynamic queries, that are re-evaluated, you can specify them with double brackets:
selector="{{select ean_code from db_product where country='${shop.country}'}}" Example:
<generate name="shop" count="10">
<attribute name="country" values="DE,AT,CH"/>
<generate name="product" count="100" consumer="db">
<attribute name="ean_code" source="db" selector="selector="{{select ean_code from db_product where country='${shop.country}'}}"/>
</generate>
</generate>
entity definition

id definition

attribute definition

all supported generator attributes
- name
- name of the feature to generate
- type
- type of the feature to generate
- nullable
- tells if the feature may be null
- mode
- controls the processing mode: (normal|ignored|secret)
- pattern
- uses a regular expression for String creation or date format pattern for parsing Dates.
- generator
- uses a Generator instance for data creation
- values
- provides a comma-separated list of values to choose from
- nullQuota
- the quota of null values to create
- converter
- the class name of a Converter to apply to the generated objects
- dataset
- a (nestable) set to create data for, e.g. dataset="US" for the United States
- locale
- a locale to create data for, e.g. locale="de"
- offset
- the number of elements to skip at the top of a generated/iterated product sequence
- unique
- wether to assure uniqueness, e.g. unique="true". Since this needs to keep every instance in memory, use is restricted to 100.000 elelments. For larger numbers you should use Sequence-based algorithms.
- source
- A system, EntityIterator or file to import data from.
- selector
- A system-dependent selector to query for data.
- trueQuota
- the quota of true values created by a Boolean Generator.
- min
- the minimum Number or Date to generate
- max
- the maximum Number or Date to generate
- precision
- the resolution of Numbers or Dates to generate
- distribution
- the distribution to use for Number or Date generation. This may be a Sequence name or a WeightFunction class name.
- minLength
- the minimum length of the Strings that are generated
- maxLength
- the maximum length of the Strings that are generated
- cyclic
- auto-resets the generator after it has gone unavailable
- proxy
- wraps a generator with a proxy (skip|repeat), which skips or repeats products
reference definition

Scripting
Scripts are supported in
- benerator setup files
- properties files
- DbUnit XML files
- CSV files
- Flat files
A script is denoted by curly braces, e.g. '{Hi, I am ${my_name}}'. This syntax will use the default script engine for rendering the text as, e.g. 'Hi, I am Volker'.
The default script engine is set by the property benerator.defaultScript .
If you need to support different script engines (e.g. while combining files from different sources), you can differ them by prepending the scripting engine id, e.g. '{ftl:Hi, I am ${my_name}}' or '{Vel:Hi, I am ${my_name}}'
Scripts in the benerator setup are evaluated while parsing. If you need to dynamically generate script text at runtime, use a attribute.script field:
<attribute name="total_price" script="{${(product[1] * db_order_item.number_of_items)?c}}" />With scripts you can access
- environment variables
- JVM parameters
- any JavaBean globally declared in the benerator setup
- the last generated entity of each type
- variable values
Variable names in scripting may not contain points - a point always implies navigation, e.g. person.familyName navigates from the person object to the familyName attribute/property/key.
staging
Combining scripting and property files, you get a staging mechanism, which is demonstarted in the shop demo. Check the file demo/shop/shop.ben.xml in your benerator installation. It uses staging for populating all seven supported databases with the same benerator setup file, moving database specific code to small properties files.
When invoking benerator with a -Dstage=development JVM parameter, you can make your import
include uri="{demo/shop/shop.${stage}.properties}" /
template support
You can use DbUnit import files for replicating entity graph structures many times on each generated object. Say, for each customer in a tested online shop, a default order structure should be created. You would then define the order structure in a DbUnit file
<dataset>
<db_order_item order_id="{${db_order.id}}" number_of_items="2" product_ean_code="8076800195057" total_price="2.40" />
<db_order_item order_id="{${db_order.id}}" number_of_items="1" product_ean_code="8006550301040" total_price="8.70" />
</dataset>
and then create an order for each customer that imports its sub structure from the DbUnit file:
<generate name="db_order" consumer="db">
<id name="id" generator="IncrementalIdGenerator" />
<attribute name="customer_id" source="db" selector="select id from db_customer" />
<iterate name="db_order_item" source="demo/shop/default_order.dbunit.xml" consumer="db">
<id name="id" generator="IncrementalIdGenerator" />
</iterate>
</generate>
Of course, you have to care for appropriate ids yourself.


