databene

 
  • Increase font size
  • Default font size
  • Decrease font size

Migrating Benerator 0.6 projects to version 0.7

A lot has changed from Benerator 0.6.6 to 0.7, based on three major driving forces:

  1. Frequent user mistakes: If a large percentage of users makes the same mistake, there is probably something wrong with the approach.

  2. Development towards a version 1.0: User demand and own plans gave cause to several requirements on a 1.0 version, which cannot be fulfilled with the 0.6 concepts.

  3. Making Benerator a useful for functional testing and providing a rich API to client software like Feed4JUnit and Feed4TestNG

Large parts of the Benerator Core have been redesigned. Several changes are not backwards compatible and are listed below, in descending probability to affect you. If you have Benerator used as is, you are likely tomigrate with little effort; if you have programmed custom Java extensions (Generators or Consumers), you will have to learn and work.

 

Syntactial Changes

<import defaults="true" />

it is not necessary any longer to specify this: defaults are imported automatically now (uh - I guess should be the default behaviour for something called 'default'). If you want to stock with the old behaviour, set defaultImports to false in the root element:

<setup ... defaultImports="false">

 

Differentiation between global settings and JavaBean (POJO) properties

Benerator used the same element name <property> for very different things. In order to avoid confusion, Benerator 0.7 names the formerly named global <property> (which was a top level element in the XML structure) to <setting>. The <property> element used within <bean> or <consumer> elements remains unchanged.

<setup>
    <property name="user_count" value="1000">
    ...
</setup>

is migrated to

<setup>
    <setting name="user_count" value="1000">     ... </setup>

 

Element Execution Order

A frequent user problem was that the descriptor elements were not necessarily executed in the order in which they appeared in the descriptor files (variables first, then attributes/ids/references, then anything else). Now the elements are executed in the specified order with the only limitation that the cunsumtion of a generated entity happens immediately before the execution of a sub-generate/iterate, so attributes/ids/references may not be placed after a sub-generate/iterate.

Thus, a descriptor file now reflects the behaviour of a programming language much more intuitive than before.

Example:

<generate name="u" type="user">
<echo>Starting user generation</echo> <!-- yes, now it is executed as the first step -->
<variable name="p" generator="PersonGenerator" />
<attribute name="first_name" script="p.givenName" />
<attribute name="last_name" script="p.familyName" />
<variable name="_fullName" script="u.first_name + ' ' + u.last_name" /> <!-- now you can access
the current entity's attributes from a variable -->

<echo>{'Created user ' + _fullName}</echo>
<generate name="order" type="order">
...
</generate>
<!-- the 'outer' entity 'u' is persisted before the sub-generate,
so it does not make sense to assign any attribute/id/reference behind that point -->

...
</generate>


Scope of Generated Data

In former versions, all generated data like entities and variables was put into the global context and thus available anywhere and anytime, making it difficult to spot mistakes in the descriptor file. With Benerator 0.7 all generated data is scoped to the descriptor level in which it was created and accessible only in that level and below. If you want to make generated data globally available, you need to assign it to a <setting> explicitly.


precision renamed to granularity, dropped totalDigits and fractionDigits

'precision' is a term which in general interpreted in a way that each number is a full multiple of the precision. Benerator has a different semantic, generating numbers as a 'min'imum value plus a full multiple of the precision. Thus, the corresponding configuration element has been renamed from 'precision' to 'granularity' in order to reflect this difference and possibly combine distinct precision and granularity concepts in future versions. The change also affects property names in some date/time generator classes. In this refactoring process, the settings 'totalDigits' and 'fractionDigits' were dropped without replacement.

Example: Generation of odd numbers:

<attribute name="x" type="int" min="1" max="999" granularity="2" />

generates numbers from the set {1, 3, 5, …, 995, 997, 999}, which members are obviously not multiples of 2 (as implied by a precision concept).


 

Renamed platform 'flat' to 'fixedwidth'

The name was unappropriate. If you used it, please update your import:

<import platforms='flat' />

becomes

<import platforms='fixedwidth' /> 

The default file suffix has been changed from .flat to .fcw. Rename your data files (or config) accordingly.

 

minExclusive, maxExclusive

Before you could use numerical minExlusive and maxEclusive values for specifying open intervals. This allowed the syntactically useless combination of 'min' and 'minExclusive' values. The syntax has been made orthogonal by introducing boolean 'minInclusive' and 'maxInclusive' constraints which have to be combined with a numerical 'min' or 'max' respectively. The default value of 'minInclusive' and 'maxInclusive' is 'true'. Example:

<attribute name="x" type="double" min="1" max="10" maxInclusive="false" granularity="0.1"/>


CSVEntityExporter

For enabling users to differ between null values and empty strings, a property 'quoteEmpty' has been introduced which is true by default and causes empty string to be rendered in double quotes. When quotedEmpty is true, a CSV row with the values 1, null, empty string is rendered as 1,,"", otherwise 1,, (former behaviour).

In order to keep the former behaviour, set 'quoteEmpty' to 'false'.


Scripted context access

Script expressions were able to use a keyword 'benerator' to access the generation context. The name was changed from 'benerator' to 'context' to reflect the semantics better.

Example: A statement

<if test="benerator.get('user_count') != 0">
     <setting name="initUserModule" value="true" />
</if>

is migrated to

<if test="context.get('user_count') != 0"> <!-- this was benerator.get(...) before -->
     <setting name="initUserModule" value="true" />
</if>

The same applies for the <include> of property files: Formerly you could write

benerator.defaultEncoding=iso-8859-1 

now it's

context.defaultEncoding=iso-8859-1   

 

removed multithreading support for <generate> and <iterate>

It never worked satisfactory and some of the current changes hold even less for a safe multithreaded execution. But there will be a multithreaded Benerator future, some day...

In clauses like

<generate ... threads="5">

remove the 'threads' attribute.

<generate ...>

 

 

Dropped variable support in XML Schema-based file generation

Sorry, I had to make a change that broke variable support in XML Schema-based generation and with the current focus on database data generation I do not have the time to migrate XML generation to the all-new mechanism.

 

 

Database-related changes

Sequence-based generators

renamed 'source' property to 'database' in DBSequenceGenerator and DBSeqHiLoGenerator


defaultOneToOne is false by default

Benerator has a global setting 'defaultOneToOne' which was true by default causing database references to be resolved in a sequencial, unique manner like a CSV file would be iterated once, row by row. Benerator 0.7+ sets this to true, resulting in a random and repeated iteration of the available values as appropriate for many-to-one associations in a database. If you want to keep the old behaviour, write

<benerator defaultOneToOne="true">


OfflineSequenceGenerator was removed

The DBSequenceGenerator now has a property 'cached'; when set to true, the class exhibits the same behaviour as the former OfflineSequenceGenerator (which was dropped), when false the traditional behaviour of the DBSequenceGenerator.

 

 

Interface changes

The following changes are only relevant if you have implemented Benerator interfaces or inherited Benerator components, In this case I apologize: I had to change a lot, on the one hand moving the extension interfaces towards a state that will hopefully hold for Benerator 1.0 and on the other hand cleaning up the Generator inheritance hierarchy from redundancies.


Generator interface

generate() signature and contract changed

For a Generator<E>, the new signature is:

ProductWrapper<E> generate(ProductWrapper<E> wrapper)

The former semantics was that a generator had to return null, if its value set was depleted and it thus became unavailable. The downturn of this approach was that Generators were not able to return null as a value. This has changed by using the ProductWrapper as return type: When generating a null value it is wrapped with a ProductWrapper, if unavailable, null is returned instead of a ProductWrapper. The ProductWrapper argument serves as a possibility to improve performance by reusing a ProductWrapper instance several times.

Renamed SimpleConverter to UnsafeConverter and SimpleGenerator to UnsafeGenerator

Renamed ValidatingGenerator.generateImpl() to doGenerate()

If you inherited from ValidatingGenerator and implemented generateImpl(), just change the name of that method to doGenerate()

Deleted MappedWeightSampleGenerator

Use AttachedWeightSampleGenerator instead

Renamed IndividualWeightGenerator to IndividualWeightSampleGenerator

Renamed DistributingSampleGeneratorProxy to IndexBasedSampleGeneratorProxy


StorageSystem interface

Moved interface to another package

Moved StorageSystem interface to org.databene.benerator, moved implementations to org.databene.storage.

Changed interface

The generic parameter has been removed from queryEntityIds() and query(). It did not prove to be useful.


Consumer interface

Moved interface to a different package

Moved Consumer interface to org.databene.benerator, moved general implementations to org.databene.consumer.

Changed signature of the ...Consuming() methods

Replaced generic parameter with ProductWrapper. Call the ProductWrapper's unwrap() method to get the generated data object.


Other changes

Replaced Iterable and Iterator with DataSource and DataIterator

This reflects changes in the underlying webdecs API which were performed to provide an efficient API for concurrent data consumption. The applied concept is similar to the change in Generator.generate() described above.

Removed 'name' property from Sequence

It was dropped without replacement. If you have inherited the Sequence class, you might need to remove the name parameter in the child class used for invoking the parent constructor.