Friday, January 21, 2011

What is new for DataStage 8 on the Information Server

DataStage 8 on the Information Server looks the same as previous releases but has some major changes under the hood and a few extra bells and whistles.

This post looks at what is new or changed in DataStage jobs. There are a lot of new functions for managing, running and reporting on jobs but I will talk about that in another post or you can look back at my (much) earlier DataStage Hawk preview post.

Goodbye DataStage 7

It's time to bid goodbye to tired old DataStage 7.

You did a good job, you struggled on for as long as you could, but like all DataStage versions through the annuls of history you didn't have the right metadata repository and you didn't play well with your brothers and sisters in your suite.

DataStage 8 on the other hand is much shinier and comes with a better metadata story as you get the new Metadata Server and the common connectors:

Release Date

The Windows version of the Information Server and DataStage 8 are out now. No sign yet of the version for other platforms.

DataStage Versions

DataStage 8 can only upgrade a DataStage 7 server, it cannot upgrade previous versions of servers though it can co-exist with previous versions. DataStage 8 can however import and upgrade export files from earlier versions of DataStage. I don't know how far back this support goes.

All the DataStage 7.x versions are available in version 8:

  • DataStage Enterprise Edition: Parallel, Server and Sequence Jobs
  • DataStage Server Edition: Server and Sequence Jobs
  • DataStage MVS: Mainframe Jobs
  • DataStage Enterprise for z/OS: runs on Unix System Services

DataStage for PeopleSoft: 2 CPU limit with Server and Sequence jobs.

I don't know whether you will ever see this version of DataStage in the PeopleSoft EPM bundle, however you may be able to upgrade existing PeopleSoft implementations to this version. Drop me a message if you try.

DataStage Addons

The DataStage Enterprise Packs and Change Data Capture components are available in version 8 as shown in the version 8 architecture overview:

DataStage Architecture Overview

Enterprise PACKs

  • SAP BW Pack
    • BAPI: (Staging Business API) loads from any source to BW.
    • OpenHub: extract data from BW.
  • SAP R/3 Pack
    • ABAP: (Advanced Business Application Processing) auto generate ABAP, Extraction Object Builder, SQL Builder, Load and execute ABAP from DataStage, CPI-C Data Transfer, FTP Data Transfer, ABAP syntax check, background execution of ABAP.
    • IDoc: create source system, IDoc listener for extract, receive IDocs, send IDocs.
    • BAPI: BAPI explorer, import export Tables Parameters Activation, call and commit BAPI.
  • Siebel Pack
    • EIM: (data integration manager) interface tables
    • Business Component: access business views via Siebel Java Data Bean
    • Direct Access: use a metadata browser to select data to extract
    • Hierarchy: for extracts from Siebel to SAP BW.
  • Oracle Applications Pack
    • Oracle flex fields: extract using enhanced processing techniques.
    • Oracle reference data structures: simplified access using the Hierarchy Access component.
    • Metadata browser and importer
  • DataStage Pack for PeopleSoft Enterprise
    • Import business metadata via a metadata browser.
    • Extract data from PeopleSoft tables and trees.
  • JD Edwards Pack
    • Standard ODBC calls
    • Pre-joined database tables via business views

Change Data Capture

These are add on products (at an additional fee) that attach themselves to source databases and perform change data capture. Most source system database owners I've come across don't like you playing with their production transactional database and will not let you near it with a ten foot poll, but I guess there are exceptions:

  • Oracle
  • Microsoft SQL Server
  • DB2 for z/OS
  • IMS

There are three ways to get incremental feeds on the Information Server: the CDC products for DataStage, the Replication Server (renamed Information Integrator: Replication Edition, does DB2 replication very well) and the change data capture functions within DataStage jobs such as the parallel CDC stage.

Removed Functions

These are the functions that are not in DataStage 8, please imaging the last waltz playing in your head as you peruse this list:

  • dssearch command line function
  • dsjob "-import"
  • Version Control tool
  • Released jobs
  • Oracle 8i native database stages
  • ClickPack

The loss of the Version Control tool is not a big deal as the import/export functions have been improved. Building a release file as an export in version 8 is easier than building it in the Version Control tool in version 7.

Database Connectivity

The common connection objects functionality means the very wide range of DataStage database connections are now available across Information Server products.

Latest supported databases for version 8:

  • DB2 8.1, 8.2 and 9.1
  • Oracle 9i, 10i, 10gR2 not Oracle 8
  • SQL Server 2005 plus stored procedures.
  • Teradata v2r5.1, v2r6.0, v2r6.1 (DB server) / 8.1 (TTU) plus Teradata Parallel Transport (TPT) and stored procedures and macro support, reject links for bulk loads, restart capability for parallel bulk loads.
  • Sybase ASE 15, Sybase IQ 11.5, 12.5, 12.7
  • Informix 10 (IDS)
  • SAS 612, 8.1, 9.1 and 9.1.3
  • IBM WS MQ 6.1, WS MB 5.1
  • Netezza v3.1
  • ODBC 3.5 standard and level 3 compliant
  • UniData 6 and UniVerse ?
  • Red Brick ?

This is not the complete list. Some database versions are missing, more databases can be accessed through the ODBC stage and there may be some databases missing.

New Database Connector Functions

This is a big area of improvement.

  • LOB/BLOC/CLOB Data: pictures, documents etc of any size can now be moved between databases. After choosing the LOB data type you can choose to pass the data inline or as a link reference.
  • Reject Links: optionally append error codes and messages, conditionally filter types of rejection, fail a job based on a percentage threshold of failures.
  • Schema Reconciliation: where the hell has this function been all my life? Automatically compare your DataStage schema to the database schema and perform minor data type conversions.
  • Improved SQL Builder that supports more database types, although if you didn't like the version 7 one you wont like the 8 one either. (Kim Duke, I'm looking at you).
  • Test button on connectors. Test! You don't have to view data or run a job to find out if the stupid thing works.
  • Drag and drop your configured database connections onto jobs.
  • Before and after SQL defined per job or per node with a failure handling option. Neater than previous versions.

DataStage 8 gives you access to the latest versions of databases that DataStage 7 may never get. Extra functions on all connectors includes improved reject handling, LOB support and easier stage configuration.

Code Packs

These packs can be used by server and/or parallel jobs to interact with other coding languages. This lets you access programming modules or functions within a job:

  • Java Pack: produce or consume rows for DataStage Parallel or Server jobs. Use a java transformer.
  • Web Service Pack: access web services operations in a Server job transformer orServer routine.
  • XML Pack: read, write or transform XML files in parallel or server jobs.

The DataStage stages, custom stages, transformer functions and routines will usually be faster at transforming data than these packs however they are useful for re-using existing code.

New Stages

A new stage from the IBM software family, new stages from new partners and the convergence of QualityStage functions into Datastage. Apart from the SCD stage these all come at an additional cost.

  • WebSphere Federation and Classic Federation
  • Netezza Enterprise Stage
  • SFTP Enterprise Stage
  • iWay Enterprise Stage
  • Slowly Changing Dimension: for type 1 and type 2 SCDs.
  • Six QualityStage stages

There are four questions that have been asked since the dawn of time. What is the meaning of life? What's this rash that comes and goes? If you leave me can I come too? How do a populate a slowly changing dimension using DataStage? The answers being 42, visit a clinic, piss off and use the new SCD stage.

New Functions Existing Stages

  • Complex Flat File Stage: Multi Format File (MFF) in addition to existing cobol file support.
  • Surrogate Key Generator: now maintains the key source via integrated state file or DBMS sequence.
  • Lookup Stage: range lookups by defining checking high and low range fields on the input or reference data table. Updatable in memory lookups.
  • Transformer Stage: new surrogate key functions Initialize() and GetNextKey().
  • Enterprise FTP Stage: now choose between ftp and sftp transfer.

You can achieve most of these functions in the current version with extra coding except for in-memory lookups. This is a killer function in DataStage 8.

Platforms

These are the platforms for the released Windows version and the yet to be released Linux/Unix version along with the C++ compiler that you only need for parallel jobs that will use transformers. You do not need this compiler for Server Edition.

-Windows 2003 SP1
•Visual Studio .NET 2003 C++, Visual Studio .NET 2005 C++ or Visual Studio .NET 2005 Express Edition C++
-AIX 5.2 & 5.3
•XL C/C++ Enterprise Edition 7.0, 8.0 compiler
-HP-UX 11i v1 & v2
•aC++ A.03.63 compiler
-Red Hat ASE 4.0
•gcc3.23 compiler
-SuSEES, 9.0
•gcc3.3.3 compiler
-Solaris 2.9 & 2.10
•Sun Studio 9, 10 , 11 compiler

Database Repository

Note the database compatibility for the Metadata Server repository is the latest versions of the three DBMS engines. DB2 is an optional extra in the bundle if you don't want to use an existing database.

  • IBM UDB DB2 ESE 9
    -IBM Information Server does not support the Database Partitioning Feature (DPF) for use in the repository layer
    -DB2 Restricted Enterprise Edition 9 is included with IBM Information Server and is an optional part of the installation however its use is restricted to hosting the IBM Information Server repository layer and cannot be used for other applications
  • Oracle 10g
  • SQL Server 2005

If you are a cheapskate and you really don't like DB2, in fact you would cross the street if you saw it coming in the other direction, you might be able to load the repository into a free (express) version of SQL Server or Oracle, however you might hit a problem with the DBMS license CPU restriction. If you get this working drop me a comment.

Languages

Foreign language support for the graphical tools and product messages:

Chinese (Simplified and Traditional), Czech, Danish, Finnish, French, German, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish and Swedish.

1 comment:

  1. does anyone know how much it cost to do the data stage upgrade from 7.x to 8.x.

    ReplyDelete