Thursday, September 22, 2011

AUTOSYS

The DataStage 8.5 Job Migration Assistant Makes Upgrading, Testing and Deployment Easier

IBM have released a tool that makes product upgrades and environment management and job deployment and regression testing easier.
This is more than a Migration Assistant, it is useful for migrating between DataStage versions but it is also useful for migrating DataStage jobs from a development environment to testing and production.  It is useful for managing duplicate versions of jobs, executing test plans or job documentation.

The DataStage Migration Assistant can be downloaded here and according to the support website:
The Job Migration Test Assistant tool is a web-based application which assists in migration of DataStage projects from previous releases into the 8.5 release. The Job Migration Test Assistant can compare the status of DataStage jobs or sequences between the older version and the newer version, and perform difference analysis including job level or table schema level checks. Reports can be genrated based on anomalies in the data between the versions.

It is downloaded through the IBM Fix Central website which means anyone can access it.  There is a catch – the product requires a DB2 repository and it comes with a JMTA_DDL_DB2.sql script.  Information Server can store its repository on DB2, SQL Server or Oracle so there will be some sites out there without DB2.  You can use the free express version of DB2 or the version that comes bundled with Information Server.
It comes with a couple prerequisites – two interim fixes for Information Server 8.5.

Migration Assistant for Upgrades


Here is a screenshot of the DataStage job overview showing jobs in the new environment of various statuses – Failed, Not Completed, Compiled, Ran Successfully and Ran with Warnings.  It has another graph for the new environment – Missing and New.

DataStage Migration Assistant

Migration Assistant for Environment Management


There is a lot more to the product than just upgrading DataStage.  The Migration Assistant can help to migrate between environments on the same version – for example synchronising your development, test and production environments.  Finding duplicate jobs between projects and merging them.

There is a tree for comparing objects between environments and comparing DataStage jobs:

DataStage Migration Assistant Job Compare

Migration Assistant for Testing


The Migration Assistant has some functions to help unit test an environment upgrade.  These same test functions can help for any type of DataStage testing – you can keep using the Migration Assistant as part of your standard test plan.  You can compare a release of DataStage jobs with the previous release by comparing a UAT test environment to a production environment, or by comparing a test environment to a test baseline environment.  With regression testing the baseline you are comparing against needs to be similar to the new set of jobs being tested.

It provides a regression testing check – the ability to find any link in any job that has a row count that is different to the regression test baseline.

DataStage Migration Assistant Testing

Another regression testing feature is the log compare – this will compare all the logs for one environment against the logs in another environment to try and find something that is different in a DataStage job – such as a new warning, or an abort or unusual information messages.  There is currently no other feature in any Information Server product that lets you compare DataStage log messages.
IBM InfoSphere DataStage Data Flow and Job Design
Datastage Installation

Friday, September 16, 2011

DataStage 8.1 to DataStage 8.5

1. DataStage Designer performance improvement By changing the Metadata algorythm, copy/delete/save jobs got faster about 30-40%. 
2. Parallel Engine Performance and Resource improvements Resource usage is about 5% smaller than 8.1, for T-Sort, Windows desktop heap size has been decreased 94%. 
3. Transformer enhancements Key break support
LastRowInGroup() function is added. This will return true for the last record of the group.
LastRow() will return the last record of input. 
Output looping :: Allows multiple output records to be created per single input record. 
Input looping    :: Allows aggregation of input records so that aggregated data can be included with the original input data. ( like adding average column to the original input is now possible. ( 2 pass.... calculation. )
New Null handling This is pretty complicate and need more verification by myself to explain clearly. But this is the description I got.
Null values can now be included in any expression. 
        -> Null values no longer need to be explicitly handled.
            A null value in an expression will return a null value result. As long as the target column is nullable, records will not be dropped. Stage variables are now always nullable. 

APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING is prepared to support backward compatibility. 

New Transformer Functions 
Create/offset a time, date or timestamp from component arguments 
DateFromComponents(int32 years, int32 months, int32 dayofmonth)
DateOffsetByComponents(date basedate, int32 yearoffset, int32 monthoffset, int32 dayoffset) DateOffsetByDays(date basedate, int32 offset)
TimeFromComponents(int32 hours, int32 minutes, int32 seconds, int32 microseconds) TimeOffsetByComponents(time basetime, int32 houroffset, int32 minuteoffset, dfloat secondoffset) TimeOffsetBySeconds(time basetime, dfloat secondoffset)
TimestampOffsetByComponents(timestamp basetimestamp, int32 yearoffset, int32 monthoffset, int32 dayoffset, int32 houroffset, int32 minuteoffset, dfloat secondoffset)
TimestampOffsetBySeconds(timestamp basetimestamp, dfloat secondoffset) 

Various packed decimal conversions 
DecimalToDate(decimal basedecimal [,string format] ) 
DecimalToTime(decimal basedecimal [,string format] ) 
DecimalToTimestamp(decimal basedecimal [,string format] )
DateToDecimal(date basedate [,string format] ) 
TimeToDecimal(time basetime [,string format] ) 
TimestampToDecimal(timestamp basetimestamp [,string format] )

4. DataStage Function enhancements New Client \ Domain Compatibility Check Before/after routines now mask encrypted params Copy project permissions from existing project when creating new project Environment variable enhancements: creation during import Add PX Stage Reset Support Enhancement to Parallel Data Set Stage Multiple Null Field Values on Import Enhancements to improve Multi-Client Manager support
5. DataStage Serviceability enhancements New Audit Tracing Enhanced Exception Dialog ISA Lite Enhancements for DataStage Enhanced Project Creation Failure Details
6. ParallelPivot - Adding Vertical Pivoting
7. CVS (Code Version Control Integration) Information Server Manager was created on Eclipse from 8.1 Now the CVS or Subversion plugins to Eclipse are available for DataStage components.

Ten Reasons Why You Need DataStage 8.5


Information Server 8.5 came out a couple weeks ago and is currently available on IBM Passport Advantage for existing customers and from IBM PartnerWorld for IM partners.  The XML pack described below is available as a separate download from the IBM Fix Central website.
This is a list of the ten best things in DataStage 8.5.  Most of these are improvements in DataStage Parallel Jobs only while a couple of them will help Server Job customers as well.

1. It’s Faster

Faster, faster, faster.  A lot of tasks in DataStage 8.5 are at least 40% faster than 8.1 such as starting DataStage, opening a job, running a Parallel job and runtime performance have all improved.

2. It' is now an XML ETL Tool

Previous versions of DataStage were mediocre at processing XML.  DataStage 8.5 is a great XML processing tool.  It can open, understand and store XML schema files.  This pack in New Hierarchical Transformer makes DataStage great aXML Tool and if you have XML files without schemas you can follow a tip at the DataStage Real Time blog: The new XMLPack in 8.5….generating xsd’s….
The new XML read and transform stages are much better at reading large and complex XML files and processing them in parallel:
DataStage 8.5 XML Job

3. Transformer Looping

The best Transformer yet.  The DataStage 8.5 parallel transformer is the best version yet thanks to new functions for looping inside a transformer and performing transformations across a grouping of records.
With looping inside a Transformer you can output multiple rows for each input row.  In this example a record has a company name and four revenue sales figures for four regions – the loop will go through each column and output a row for each value if it is populated:
DataStage 8.5 Transformer Looping
4. Transformer Remembering
DataStage 8.5 Transformer has Remembering and key change detection which is something that ETL experts have been manually coding into DataStage for years using some well known workarounds.  A key change in a DataStage job involves a group of records with a shared key where you want to process that group as a type of array inside the overall recordset. 
I am going to make a longer post about that later but there are two new cache objects inside a Transformer – SaveInputRecord() and GetSavedInputRecord(0 where you can save a record and retrieve it later on to compare two or more records inside a Transformer. 
There are new system variables for looping and key change detection - @ITERATION, LastRow() indicates the last row in a job, LastTwoInGroup(InputColumn) indicates a particular column value will change in the next record.
Here is an aggregation example where rows are looped through and an aggregate row is written out when the key changes:DataStage 8.5 Transformer Aggregation

4. Easy to Install

Easier to install and more robust.  DataStage 8.5 has the best installer of any version of DataStage ever.  Mind you – I jumped aboard the DataStage train in version 3.6 so I cannot vouch for earlier installers but 8.5 has the best wizard, the best pre-requisite checking and the best recovery.  It also has the IBM Support Assistant packs for Information Server that make debugging and reporting of PMRs to IBM much easier.  There is also a Guide to Migrating to InfoSphere Information Serve 8.5 that explains how to migrate from most earlier versions.
Patch Merge – that’s right, patch merge.  The new installer has the ability to merge patches and fixes into the install for easier management of patches and fixes.

5. Check In and Check Out Jobs

Check in and Check out version control.  DataStage 8.5 Manager comes with direct access to the source control functions of CVS and Rational ClearCase in an Eclipse workspace.  You can send artefacts to the source control system and replace a DataStage component from out of the source control system.
DataStage 8.5 Check In
DataStage 8.5 comes with out of the box menu integration with CVS and Rational ClearCase but for other source control systems you need to use the Eclipse source control plugins.

6. High Availability Easier than ever

High Availability – the version 8.5 installation guide has over thirty pages on Information Server topologies including a bunch of high availability scenarios across all tiers of the product.  On top of that there are new chapters for the high availability of the metadata repository, the services layer and the DataStage engine.
  • Horizontal and vertical scaling and load balancing.
  • Cluster support for WebSphere Application Server.
    • Cluster support for XMETA repository: DB2 HADR/Cluster or Oracle RAC.
    • Improved failover support on the engine.
7. New Information Architecture Diagramming Tool
InfoSphere Blueprint Direct – DataStage 8.5 comes with a free new product for creating diagrams of an information architecture and linking elements in the diagram directly into DataStage jobs and Metadata Workbench metadata.  Solution Architects can draw a diagram of a data integration solution including sources, Warehouses and repositories.
DataStage 8.5 Blueprint Director
8. Vertical Pivot
There are people out there who have been campaigning for vertical pivot for a long time – you know who you are!  It is now available and it can pivot multiple input rows with a common key into output rows with multiple columns.  Key based groups, columnar pivot and aggregate functions.
You can also do this type of vertical pivoting in the new Transformer using the column change detection and row cache – but the Vertical pivot stage makes it easier as a specialised stage.
9. Z/OS File Stage
Makes it easier to process complex flat files by providing native support for mainframe files.  Use it for VSAM files – KSDS, ESDS, RRDS.  Sequential QSAM, BDAM, BSAM.  Fixed and variable length records.  Single or multiple record type files.
DataStage 8.5 zOS File Stage
10.  Balanced Optimizer Comes Home
In DataStage 8.5 the Balanced Optimizer has been merged into the Designer and it has a number of usability improvements that turns DataStage into a better ETLT or ELT option.  Balanced Optimizer looks at a normal DataStage job and comes up with a version that pushes some of the steps down onto a source or target database engine.  IE it balances the load across the ETL engine and the database engines.
Version 8.5 has improved logging, improved impact analysis support and easier management of optimised versions of jobs in terms of creating, deleting, renaming, moving, compiling and deploying them.
DataStage 8.5 Balanced Optimizer