Friday, July 07, 2017

XSLT. Apples and Oranges.


Believe me or not, I'm not going to compare XSL processors or whining about poor support for XSLT 2.0. Recently I have faced the situation,  where I should merge two property lists and produce combined result. Let's do some shopping and play with the grocery lists. The reasons why I'm writing this post are:

  • SOA 11g/12c supports XSLT 2.0 transformations and you can use all the power of it. 
  • XSLT Mapper is far behind the processor and may eve lie to you with the error messages. 
The example below is a little trinket, but it may save you hours on developing external services for the data transformation. 



Data staging

Every SOA project starts with the data definition. I'm not going to build any SOA composites, still, before play with the data transformations let's play with the data definitions.  To simulate original situation I have declared 2 complex types and two elements in the single XML Schema document with the same namespace. Your data would come from different sources, and most probably will have different namespaces, but it works as an illustration. SO take a look at the XML Schema diagram below
The Orchard element may have one or more named elements, and amounts. The GroceryList has no named elements but an array of list items, with the attributes Name and Type. In this namespace, we can describe ten apples could as: 

<Orchard xmlns="http://mmikhail.com/sample/xml/Orchard">
   <Apples>10</Apples>
</Orchard> 

Or as an item in the shopping list:

<GroceryList xmlns="http://mmikhail.com/sample/xml/Orchard">
   <Item Name="Apples">10</Item>
</GroceryList>

Data manipulations

Imagine, that you have two sources; you have Orchard element with the fruits quantity from the first one, and  GroceryList with some prepopulated data. Now you want to combine lists and go for the shopping. I don't even consider SOA/OSB to implement that but even for Java you have to create a serious amount of code (and some decent amount of JUnit tests to cover functionality), so XSLT processor is the only viable solution. If you have learned XSLT mapper by Oracle documentation, you may think: "Hey, it's hardly possible even there! I'd rather dump all those lists to the database and query it back with the GROUP BY clause".  Yes, you can, but you also could use some XSLT 2.0 features to perform this task staying in the XML area.  From common sense we should:

  • Unify data format
  • Combine unified data 
  • Deliver result
Steps from the list ideally map to the main ETL stages  (Extract - Transformation - Load), so we seem to be on the right way. However, we need one more preparation step - create the test data. You may create your own or just copy paste mine.

<?xml version="1.0" encoding="UTF-8" ?>
<Orchad  
   xmlns="http://mmikhail.com/sample/xml/Orchard">
   <Apples>3</Apples>
   <Pears>2</Pears>
   <Oranges>3</Oranges>
<Bananas>1</Bananas>
</Orchad>
<?xml version="1.0" encoding="UTF-8" ?>
<GroceryList 
   xmlns="http://mmikhail.com/sample/xml/Orchard">
   <Item Name="Apples">10</Item>
   <Item Name="Plums">5</Item>
   <Item Name="Oranges">2</Item>
</GroceryList>

Preparation

To minimize preparation time clone the project from my repository, or copy files to your existing project. To build a mapper we need XML Schema document and sample files to test XSLT. 
The project has been created in the Oracle JDeveloper 11g, but you can use it in 12c, the result will be the same, but mapper in 12.2.1 si much fancier. Now create empty XSL document using steps below:
  1. From the "New Galery" wizard select General/XML  branch and then "XSL Map" from the item list. 
  2. Provide new XSL file name, and point to the Orchard as the root element. You may need to specify XSD file, due 11g/12c structural project differences. 
  3. Add an additional source with the name OldList and set type as GroceryList from the same schema document.
  4. Specify target as GroceryList element from the same XML schema document.
  5. Switch to the XSL source code view and locate entry
    <xsl:stylesheet version="1.0" ..
  6. Correct version to 2.0 and save the file.
    <xsl:stylesheet version="2.0"
From version to version Oracle declares no support of XSL 2.0 by XSL Mapper and encourage you to use source code editor. I'd recommend you to follow recommendations and stay in the source view most of the time, especially if you develop something a little bit complex than element <-> element mappings.

Unify data format

Another quite common knowledge: XSLT Mapper doesn't work well with arrays. Of course, there is a way to work with arrays: whenever you need to build a list of the nodes: create it in the XSL variable and then copy it to the result.  
In your document locate OldList parameter definition and add fragment right after it.

 <xsl:variable name="Orchard">
   <xsl:for-each select="/ns0:Orchad/*">
       <ns0:Item Name="{name()}">
            <xsl:value-of select="."/>
       </ns0:Item>        
   </xsl:for-each>
</xsl:variable>

Let's go through the code:

  •  xsl:variable - declares a new variable with the name Orchard, later we will refer it as $Orchard
  • <xsl:for-each> - loops through all the /ns0:Orchard nodes regardless of the element name (* - is a wildcard)
  •  For each element in /ns0:Orachard we create the new ns0:Item element with the attribute Name equal to selected element name. Function name() returns element name, and {} is a syntax sugar for the more formal definition <xsl:value-of select="name()"/>. 
  • Every Item will have a current element value: "." in the xsl:value-of command.
Now we have a data structure similar to the GroceryList element. 

Combine unified data

To group and query data I use xsl:for-each-group command. It's not supported by the XSLT Mapper but processed by XSLT 2.0 processor without any issues. Let's take a look at the code below:

<xsl:for-each-group 
    select="$OldList/ns0:GroceryList/ns0:Item,$orchd/ns0:Item" 
 group-by="@Name">  
       <ns0:Item Name="{current-grouping-key()}">
        <xsl:value-of select="current-group()[1]"/>
       </ns0:Item> 
</xsl:for-each-group>
  • xsl:for-each-group -  loops over the grouped elements from the elements, described in the select attribute. Attribute group-by defines how data should be grouped. In the example above, we use ns0:Item elements from two variables and group them by the Name attribute
  • For every group, we create new ns0:Item element and set value for the attribute Name equal to the current group key {current-grouping-key()}.
  • Function current-group() returns a list of the elements with the same grouping key. Command above copies the first value from that list: current-group()[1].
If you have no plans to use it with SOA/JDeveloper 11g you may use this XSLT construct in the document template. However, XSLT Mapper 11g  shows some scary error messages. To suppress errors, just wrap the code into another variable, let's name it $groups

Deliver results

It's really simple step, all that we need is define the root template and copy our list from the variable. 
Code for the template is:

<xsl:template match="/">
    <ns0:GroceryList>
      <xsl:copy-of select="$groups"/>
   </ns0:GroceryList>   
</xsl:template>

Save the template and test it.


Awesome, transformation works as expected. Although results may be a little bit disappointing.  Orchard entry has 3 for applies and original grocery list 10. In the result, you may see that value has been copied from the first entry in the group. XSLT processor build lists using source precedence in the select clause of the for-each-group command. If you want to overwrite old list value you have to options: change source order or select a different element from the group elements. Additionally, you may play with the summary functions to get combinations. In the table below you may fins some ideas how to mix and match the grouped data.

Enjoy and happy data transformations!

Selector Result style="width: 40%"Description
<xsl:value-of 
  select="current-group()[1]"/>
<ns0:GroceryList 
    xmlns:ns0="http://mmikhail.com/sample/xml/Orchard" >
   <ns0:Item Name="Apples">10</ns0:Item>
   <ns0:Item Name="Plums">5</ns0:Item>
   <ns0:Item Name="Oranges">2</ns0:Item>
   <ns0:Item Name="Pears">2</ns0:Item>
   <ns0:Item Name="Bananas">1</ns0:Item>
</ns0:GroceryList>
Uses value from the first source
<xsl:value-of 
  select="current-group()[last()]"/>
<ns0:GroceryList 
    xmlns:ns0="http://mmikhail.com/sample/xml/Orchard" >
   <ns0:Item Name="Apples">3</ns0:Item>
   <ns0:Item Name="Plums">5</ns0:Item>
   <ns0:Item Name="Oranges">3</ns0:Item>
   <ns0:Item Name="Pears">2</ns0:Item>
   <ns0:Item Name="Bananas">1</ns0:Item>
</ns0:GroceryList>
Result value from the second source. I use function last() instead of index 2, because some groups may have only 1 element, and mapping will fail.
<xsl:value-of 
   select="sum(current-group())"/>
<ns0:GroceryList 
   xmlns:ns0="http://mmikhail.com/sample/xml/Orchard">
   <ns0:Item Name="Apples">13</ns0:Item>
   <ns0:Item Name="Plums">5</ns0:Item>
   <ns0:Item Name="Oranges">5</ns0:Item>
   <ns0:Item Name="Pears">2</ns0:Item>
   <ns0:Item Name="Bananas">1</ns0:Item>
</ns0:GroceryList>

Item contains sum of the source values. You may experiment with the min, max, and count functions as well.


 Apple and Orange picture by Michael Johnson [CC BY 2.0], via Wikimedia Commons

No comments: