当前位置:文档之家› DataStage使用手册

DataStage使用手册

DataStage使用手册
DataStage使用手册

Ascential DataStage

Director Guide Version 7.5.1

Part No. 00D-007DS751

December 2004

his document, and the software described or referenced in it, are confidential and proprietary to Ascential Software Corporation ("Ascential"). They are provided under, and are subject to, the terms and conditions of a license agreement between Ascential and the licensee, and may not be transferred, disclosed, or otherwise provided to third parties, unless otherwise permitted by that agreement. No portion of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of Ascential. The specifications and other information contained in this document for some purposes may not be complete, current, or correct, and are subject to change without notice. NO REPRESENTATION OR OTHER AFFIRMATION OF FACT CONTAINED IN THIS DOCUMENT, INCLUDING WITHOUT LIMITATION STATEMENTS REGARDING CAPACITY, PERFORMANCE, OR SUITABILITY FOR USE OF PRODUCTS OR SOFTWARE DESCRIBED HEREIN, SHALL BE DEEMED TO BE A WARRANTY BY ASCENTIAL FOR ANY PURPOSE OR GIVE RISE TO ANY LIABILITY OF ASCENTIAL WHATSOEVER. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL ASCENTIAL BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. If you are acquiring this software on behalf of the U.S. government, the Government shall have only "Restricted Rights" in the software and related documentation as defined in the Federal Acquisition Regulations (FARs) in Clause

52.227.19 (c) (2). If you are acquiring the software on behalf of the Department of Defense, the software shall be classified as "Commercial Computer Software" and the Government shall have only "Restricted Rights" as defined in Clause 252.227-7013 (c) (1) of DFARs.

This product or the use thereof may be covered by or is licensed under one or more of the following issued patents: US6604110, US5727158, US5909681, US5995980, US6272449, US6289474, US6311265, US6330008,

US6347310, US6415286; Australian Patent No. 704678; Canadian Patent No. 2205660; European Patent No. 799450; Japanese Patent No. 11500247.

? 2005 Ascential Software Corporation. All rights reserved. DataStage?, EasyLogic?, EasyPath?, Enterprise Data Quality Management?, Iterations?, Matchware?, Mercator?, MetaBroker?, Application Integration, Simplified?, Ascential?, Ascential AuditStage?, Ascential DataStage?, Ascential ProfileStage?, Ascential QualityStage?, Ascential Enterprise Integration Suite?, Ascential Real-time Integration Services?, Ascential MetaStage?, and Ascential RTI? are trademarks of Ascential Software Corporation or its affiliates and may be registered in the United States or other jurisdictions.

The software delivered to Licensee may contain third-party software code. See Legal Notices (legalnotices.pdf) for more information.

How to Use this Guide Ascential DataStage? is a powerful software suite that is used to

develop and run DataStage jobs. A DataStage job can extract from

different sources, and then cleanse, integrate, and transform the data

according to your requirements. The clean data is ready to be

imported into a data warehouse for analysis and processing by

business information software.

This manual describes the DataStage Director, the DataStage

component that is used to validate, schedule, run, and monitor

DataStage server jobs and parallel jobs. For information about how to

perform these tasks for DataStage mainframe jobs, refer to the

documentation supplied with the mainframe computer. (For a brief

explanation of server, parallel, and mainframe jobs, refer to

"DataStage Projects and Jobs" on page1-3.)

To use this manual you should be familiar with the Windows 2000 or

Windows XP interface, but no other special skills or knowledge are

required.

To find particular topics you can:

Use the Guide’s contents list (at the beginning of the Guide).

Use the Guide’s index (at the end of the Guide).

Use the Adobe Acrobat Reader bookmarks.

Use the Adobe Acrobat Reader search facility (select Edit ?

Search).

The guide contains links both to other topics within the guide, and to

other guides in the DataStage manual set. The links are shown in blue.

Note that, if you follow a link to another manual, you will jump to that

manual and lose your place in this manual. Such links are shown in

italics.

Director Guide iii

Organization of This Manual How to Use this Guide iv Director Guide

Organization of This Manual

This manual contains the following:

Chapter 1 contains an overview of DataStage ? and its component

parts.

Chapter 2 describes the DataStage Director and how to use it.

Chapter 3 covers how to validate, run, delete, schedule, and

administer DataStage server jobs.

Chapter 4 describes job batches.

Chapter 5 describes how to monitor a running server job.

Chapter 6 describes the job log file and the Job Log view.

The Glossary defines terms that have specific meaning in

DataStage. Documentation Conventions

This manual uses the following conventions: Convention

Usage Bold In syntax, bold indicates commands, function names, keywords,

and options that must be input exactly as shown. In text, bold

indicates keys to press, function names, and menu selections.

UPPERCASE In syntax, uppercase indicates BASIC statements and functions

and SQL statements and keywords.

Italic In syntax, italic indicates information that you supply. In text,

italic also indicates UNIX commands and options, file names,

and pathnames.

Plain In text, plain indicates Windows commands and options, file

names, and path names.

Lucida Typewriter The Lucida Typewriter font indicates examples of source code

and system output.

Lucida Typewriter In examples, Lucida Typewriter bold indicates characters that

the user types or keys the user presses (for example,

).

[ ]Brackets enclose optional items. Do not type the brackets unless

indicated.

{ }

Braces enclose nonoptional items from which you must select

at least one. Do not type the braces.

How to Use this Guide DataStage Documentation

Director Guide v DataStage Documentation

DataStage documentation includes the following:

DataStage Director Guide : This guide describes the DataStage

Director and how to validate, schedule, run, and monitor

DataStage server jobs.

DataStage Manager Guide : This guide describes the DataStage

Manager and describes how to use and maintain the DataStage

Repository.

DataStage Designer Guide : This guide describes the DataStage

Designer, and gives a general description of how to create, design,

and develop a DataStage application.

DataStage Server: Server Job Developer’s Guide : This guide

describes the tools that are used in building a server job, and it

supplies programmer’s reference information.

DataStage Enterprise Edition: Parallel Job Developer’s

Guide : This guide describes the tools that are used in building a

parallel job, and it supplies programmer’s reference information.

DataStage Enterprise Edition: Parallel Job Advanced

Developer’s Guide : This guide gives more specialized

information about parallel job design.

DataStage Enterprise MVS Edition: Mainframe Job

Developer’s Guide : This guide describes the tools that are used

in building a mainframe job, and it supplies programmer’s

reference information.

DataStage Administrator Guide : This guide describes

DataStage setup, routine housekeeping, and administration.itemA | itemB

A vertical bar separating items indicates that you can choose only one item. Do not type the vertical bar....

Three periods indicate that more of the same type of item can optionally follow.? A right arrow between menu commands indicates you should choose each command in sequence. For example, “Choose File

? Exit” means you should choose File from the menu bar, then choose Exit from the File pull-down menu.

This line ? continues

The continuation character is used in source code examples to

indicate a line that is too long to fit on the page, but must be

entered as a single line on screen.Convention

Usage

DataStage Documentation How to Use this Guide

vi Director Guide

DataStage Install and Upgrade Guide . This guide contains

instructions for installing DataStage on Windows and UNIX

platforms, and for upgrading existing installations of DataStage.

DataStage NLS Guide . This Guide contains information about

using the NLS features that are available in DataStage when NLS

is installed.These guides are also available online in PDF format. You can read

them using the Adobe Acrobat Reader supplied with DataStage. See

Install and Upgrade Guide for details on installing the manuals and

the Adobe Acrobat Reader.

You can use the Acrobat search facilities to search the whole

DataStage document set. To use this feature, select Edit ? Search

then choose the All PDF documents in option and specify the

DataStage docs directory (by default this is C:\Program

Files\Ascential\DataStage\Docs).

Extensive online help is also supplied. This is particularly useful when

you have become familiar with DataStage, and need to look up

specific information.

Contents

How to Use this Guide

Organization of This Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iv

Documentation Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .iv

DataStage Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-v Chapter 1

Introducing DataStage

What Is a Data Warehouse? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

Why Do I Need One? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

What Does DataStage Do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

How Is DataStage Packaged? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

DataStage Projects and Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 Chapter 2

The DataStage Director

Starting the DataStage Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1

The DataStage Director Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 Job Category Pane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3

Display Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3

Menu Bar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4

Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5

Status Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5 Job Status View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6 Job States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7

Job Status Details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 Director Guide vii

Contents

Shortcut Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8 Shortcut Menus in the Job Status View. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9

Shortcut Menus in the Job Log View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9

Shortcut Menus in the Job Schedule View . . . . . . . . . . . . . . . . . . . . . . . . 2-10

Shortcut Menus in the Job Category Pane . . . . . . . . . . . . . . . . . . . . . . . . 2-10

Shortcut Menu in the Monitor Window. . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10 Filtering the Job Status or Job Schedule View. . . . . . . . . . . . . . . . . . . . . . . . 2-11 Examples of Filtering by Job Name. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12 Finding Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13

Sorting Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15

Printing the Current View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15 What Is in the Printout?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16

Changing the Printer Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 Director Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 General Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18

Limits Page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-19

View Page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20

Priority Page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-21 Choosing an Alternative Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22 Viewing Jobs in Another Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22

Viewing Jobs on a Different Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22 Exiting the DataStage Director. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23

Chapter 3

Running DataStage Jobs

Setting Job Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

Validating a Job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3

Running a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4

Stopping a Job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5

Resetting a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5

Restarting Job Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6

Setting Default Job Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6

Job Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7 Job Schedule View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8

Viewing Details of a Job Schedule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9

Scheduling a Job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11

Unscheduling a Job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12

Rescheduling a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 viii Director Guide

Contents

Deleting a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13

Job Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14 Cleaning Up Job Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14

Clearing a Job Status File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17 Multiple Job Invocations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17

Setting Tracing Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20

Generating Operational Meta Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21

Disabling Message Handlers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21 Chapter 4

Job Batches

What Is a Job Batch?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1

Creating a Job Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1

Running a Job Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3

Scheduling a Job Batch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4

Unscheduling a Job Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5

Rescheduling a Job Batch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5

Job Schedule Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6

Editing a Job Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6

Copying a Job Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7

Deleting a Job Batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7 Chapter 5

Monitoring Jobs

The Monitor Window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 Monitor Shortcut Menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4

Setting the Server Update Interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 Switching Between Monitor Windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6

The Stage Status Window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6 Chapter 6

The Job Log File

Job Log View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

The Event Detail Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3

Viewing Related Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5

Filtering the Job Log View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 Director Guide ix

Contents

Purging Log File Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7 Purging Log Entries Immediately. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8

Purging Log Entries Automatically. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8 Message Handlers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 Adding Rules to Message Handlers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10

Managing Message Handlers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12 Glossary

x Director Guide

1

Introducing DataStage Many organizations want to make better use of their data. But that

data may be stored in different formats in different types of database.

Some data sources may be dormant archives, others may be busy

operational databases. Extracting and cleaning data from these varied

sources has always been time-consuming and costly – until now.

DataStage makes it simple to design and develop efficient

applications that make data warehousing a reality where it was

impossible before.

What Is a Data Warehouse?

A data warehouse is a central database containing copies of data from

all the operational sources and archive systems in an organization.

But the database does not have to be large. Instead of storing details

of every transaction, order, or set of sales figures, the data warehouse

stores totals, averages, area figures, and so on. This data is structured

to make it easy to query and to generate reports.

Inside the data warehouse you can perform analyses that would be

impractical on a working database. This means that anyone who

needs access to the data gets all the information they want, and only

the information they want. The data warehouse can be created or

updated at any time, with minimum disruption to working systems.

Why Do I Need One?

Working databases are busy. It is hard to gain an accurate picture of

the contents of the database at any time because it changes

Director Guide1-1

What Does DataStage Do?Introducing DataStage

1-2

Director Guide frequently. By transferring working data into a data warehouse, you

can take snapshots of what is going on. Also, working databases

contain dirty data – records in different formats, with key values

missing or out of range, and so on. In a fast-moving working database

it is difficult to trap mistakes or incomplete entries. Using DataStage,

you can cleanse data before loading it into a data warehouse,

ensuring that your business decisions are based only on valid

information.

As well as a working database, you may have archive systems or

incompatible data sources that you have inherited. These may be

static, but inaccessible because their format is different from your

working system. You can use DataStage to transform this data into

compatible formats that can be stored in the data warehouse.

What Does DataStage Do?

DataStage comes in between your data and your data warehouse.

DataStage jobs process the data to meet your needs, including:

Extraction – DataStage takes data from indexed files, sequential

files, networked databases, archives, and external data sources

and stores it in the data warehouse.

Aggregation – DataStage takes your working data, calculates

totals and averages, then stores it in the data warehouse. This

means you have to store much less data, which is quicker and

easier to access.

Transformation – DataStage converts inconsistent data into the

required format and loads it into the data warehouse.How Is DataStage Packaged?

DataStage is client/server software. The server holds the data while it

is being processed. The client is the interface to DataStage that is used

for designing and running jobs, or managing the data in the

Repository. The client components include:

DataStage Designer , for creating DataStage server and

mainframe jobs. Server jobs are compiled into executable

programs that are scheduled by the DataStage Director and run by

the DataStage Server. Mainframe jobs are downloaded from the

Designer to mainframe computers, where they are compiled and

run by mainframe tools.

DataStage Director , for running DataStage server jobs.

Introducing DataStage DataStage Projects and Jobs

Director Guide

1-3 DataStage Manager , for viewing and editing the contents of the

Repository.

DataStage Administrator , for administering DataStage projects

and conducting housekeeping on the server. The client and server components installed depend on the edition of

DataStage you have purchased. DataStage is packaged in two ways:

Developer’s Edition , used by developers to design, develop,

and create executable DataStage jobs. The Developer’s Edition

contains all the client components described earlier and the

server. The developer’s role, DataStage Designer, and DataStage

Manager are described in DataStage Designer Guide and

DataStage Manager Guide .

Operator’s Edition , used by operators to validate, schedule, run,

and monitor DataStage jobs that have been developed elsewhere.

The Operator’s Edition contains DataStage Director, DataStage

Administrator, and DataStage Server components only. DataStage Projects and Jobs

You always enter DataStage through a DataStage project. When you

start a DataStage client you are prompted to attach to a project. Each

project contains DataStage jobs and the components required to

develop or run them.

DataStage jobs are made up of individual stages. A stage represents a

data source or a process. For example, one stage may extract data

from a data source, while another transforms it. The data required at

each stage and how it is handled is specified in the job design. When

the job is run, the processing described in the job design is

performed. Variable parameters such as file names, dates, and so on,

can be specified when the job is run. DataStage jobs can be exported

for use on other DataStage systems.

DataStage supports three types of job:

Server jobs are both developed and compiled using DataStage

client tools. Compilation of a server job creates an executable that

is scheduled and run from the DataStage Director.

Parallel jobs . These are compiled and run on the DataStage

server in a similar way to server jobs, but support parallel

processing on SMP , MPP , and cluster systems.

Mainframe jobs are developed using the same DataStage client

tools as for server jobs, but compilation and execution occur on a

mainframe computer. The DataStage Designer generates a

COBOL source file and supporting JCL script, then lets you upload

DataStage Projects and Jobs Introducing DataStage

1-4Director Guide

them to the target mainframe computer. The job is compiled and

run on the mainframe computer under the control of native

mainframe software.

There are also:

Job Sequences . A job sequence allows you to specify a

sequence of DataStage jobs to be executed, and actions to take

depending on results.

2

The DataStage Director The DataStage Director is the client component that validates, runs,

schedules, and monitors jobs run by the DataStage Server. It is the

starting point for most of the tasks a DataStage operator needs to do

in respect of DataStage jobs.

Note DataStage mainframe jobs run on a mainframe computer,

and use mainframe-specific tools. These jobs are not visible

in the DataStage Director. In this manual the term job

therefore refers to DataStage server and parallel jobs only.

For information about running DataStage mainframe jobs,

consult the documentation supplied with your mainframe

software.

This chapter describes the interface to the DataStage Director and

how to use it, including:

Starting the DataStage Director

Using the DataStage Director window

Finding text in the DataStage Director window, sorting data, and printing out the display

Setting options and defaults for the DataStage Director window

and for jobs you want to run

Switching between projects and exiting the DataStage Director Starting the DataStage Director

To start the DataStage Director:

Director Guide2-1

Starting the DataStage Director The DataStage Director

2-2Director Guide

1Choose Start ? Programs ? Ascential DataStage ?

DataStage Director , or choose the appropriate program folder if

you installed DataStage elsewhere. The Attach to Project dialog

box appears:

2

Enter the name of your host in the Host system field. This is the name of the system where the DataStage Server is installed.3

Enter your user name in the User name field. This is your user name on the server system.4Enter your password in the Password field. If you are connecting

to a Windows server via LAN Manager, you can select the Omit

check box. The User name and Password fields gray out and you

log on to the server using your current Windows account details.

Warning

Think carefully before using the Omit option to log on to DataStage. If you use this option, note that:–

You cannot specify UNC pathnames in DataStage clients.–

You must specify the host name in uppercase.–

You cannot access remote files.–You may get errors if you import meta data from an ODBC data

source if you are not logged on to the same domain as the

server.

5Enter the name of the project you want to use or choose one from

the Project list, which displays all the projects installed on the

server.

6

Click OK . The DataStage Director window appears.

Note You can also start the DataStage Director from the

DataStage Designer or the DataStage Manager by

choosing Tools ? Run Director . You are automatically

attached to the same project and you do not see the

Attach to Project dialog box. For more information

about the DataStage Designer and Manager, see

The DataStage Director The DataStage Director Window

Director Guide 2-3DataStage Designer Guide and DataStage Manager

Guide .

The DataStage Director Window

The DataStage Director window appears when you start the Director:

This section describes the features of the DataStage Director window

including:

The job category pane

The display area

The menu bar

The toolbar The status bar

Job Category Pane

The left pane of the DataStage Director window is the job category

pane. It displays the job category tree, which lists job categories and

subcategories that contain server jobs. The jobs in the currently

selected category are listed in the display area. You can hide the job

category pane by choosing View ? Show Categories .

Display Area

The display area is the main part of the DataStage Director window.

There are three views:

Job Status. The default view, which appears in the right pane of

the DataStage Director window. It displays the status of all jobs in

the category currently selected in the job category tree. If you hide

The DataStage Director Window The DataStage Director

2-4Director Guide

the job category pane, the Job Status view includes a Category

column, and displays the status of all server jobs in the current

project, regardless of their category. See "Job Status View" on

page 2-6 for more information.

Job Schedule. Displays a summary of scheduled jobs and

batches in the currently selected job category. If the job category

pane is hidden, the display area shows all scheduled jobs and

batches, regardless of their category. See Chapter 3, "Running

DataStage Jobs," for a description of this view. To switch to the

Job Schedule view, choose View ? Schedule , or click the

Schedule button on the toolbar.

Job Log. Displays the log file for a job chosen from the Job

Status view or the Job Schedule view. The job category pane is

always hidden. See Chapter 6, "The Job Log File," for more details.

To switch to this view, choose View ? Log , or click the Log

button on the toolbar.Updating the Display

You can set how often the display area is updated from the server by

editing the Director options. For more information, see "Director

Options" on page 2-17.

You can also update the screen immediately by doing one of the

following:

Choose View ? Refresh . Choose Refresh from the shortcut menus (see "Shortcut Menus"

on page 2-8 for more details).

Press Ctrl-R .Each entry in the display area represents a job, scheduled job, or

event in the job log, depending on the current view. An icon is

displayed by default for each entry. To hide the icons, choose Tools ?

Options ? View .

Note You can increase the refresh rate by organizing jobs within

job categories, so that you do not display more jobs than

necessary in the display area. For information about how to

organize jobs within categories, refer to DataStage Designer

Guide .

Menu Bar

The menu bar has six pull-down menus that give access to all the

functions of the Director:

Project. Opens an alternative project and sets up printing.

The DataStage Director The DataStage Director Window

Director Guide 2-5 View. Displays or hides the toolbar, status bar, buttons, or job

category pane, specifies the sorting order, changes the view,

filters entries, shows further details of entries, and refreshes the

screen.

Search. Starts a text search dialog box.

Job. Validates, runs, schedules, stops, and resets jobs, purges old

entries from the job log file, deletes unwanted jobs, cleans up job

resources (if the administrator has enabled this option), and

allows you to set default job parameter values.

Tools. Monitors running jobs, manages job batches, and starts

the DataStage Designer and DataStage Manager. It also starts

MetaStage Explorer and Quality Manager, if these components

are installed on the system, and custom software. If you are

running parallel jobs on a UNIX server, allows you to manage data

sets (see "Managing Data Sets" in Parallel Job Developer’s Guide

for details).

Help. Invokes the Help system. You can also get help from any

screen or dialog box in the DataStage Director.Toolbar

The toolbar gives quick access to the main functions of the DataStage

Director.

The toolbar is displayed by default, but can be hidden by choosing

View ? Toolbar or by changing the Director options . See

"Director Options" on page 2-17 for more details . To display

ToolTips, let the cursor rest on a button in the toolbar.

Status Bar

The status bar appears at the bottom of the DataStage Director

window and displays the following information:

The name of a job (if you are displaying the Job Log view).

Job Run a Job Find Sort -Reset a Help

Reschedule a Job Open

Job

Job Ascending Log Status Project a Job Stop a Job Print

View Job Schedule Sort -

Descending

Job Status View The DataStage Director

2-6Director Guide

The number of entries in the display. If you look at the Job Status

or Job Schedule view and use the Filter Entries… command, this

panel specifies the number of lines that meet the filter criteria. If

you have set a filter then (filtered) or (limited) is displayed.

The date and time on the DataStage server.Note Under certain circumstances, the number of entries in the

display is replaced by the last error message issued by the

server. The message disappears when the screen is

refreshed.

Job Status View

The Job Status view is the default view displayed when you start the

DataStage Director (see "The DataStage Director Window" on

page 2-3). You can change the default view by editing the Director

options (see "Director Options" on page 2-17 for more details). You

can also filter the jobs displayed in the view (see "Filtering the Job

Status or Job Schedule View" on page 2-11).

The Job Status view shows the status of all the jobs in the currently

selected job category, or, if the job category pane is hidden, in the

current project. The view has the following columns:Column Description Job name

The name of the job.Status

The status of the job. See "Job States" for the possible job states and what they mean.Started On date

The time and date a job was started. These fields are only filled in for a job with a status of https://www.doczj.com/doc/ad2829238.html,st ran On date The time and date the job was finished, stopped, or

aborted. These columns are blank for jobs that have

never been run.

Description

A description of the job, if available.

Datastage 安装后启动was失败

按照安装教程安装虚拟机版的datastage 8.7后,使用命令启动was失败 [plain]view plain copy https://www.doczj.com/doc/ad2829238.html,srvr:~ # /opt/IBM/WebSphere/AppServer/bin/startServer.sh server1 2.ADMU0116I: Tool information is being logged in file 3. /opt/IBM/WebSphere/AppServer/profiles/InfoSphere/logs/server1/sta rtServer.log 4.ADMU0128I: Starting tool with the InfoSphere profile 5.ADMU3100I: Reading configuration for server: server1 6.ADMU3200I: Server launched. Waiting for initialization status. 7.ADMU3011E: Server launched but failed initialization. startServer.log, 8. SystemOut.log(or job log in zOS) and other log files under 9. /opt/IBM/WebSphere/AppServer/profiles/InfoSphere/logs/server1 sho uld 10. contain failure information. 按照提示查看报错日志: [html]view plain copy https://www.doczj.com/doc/ad2829238.html,srvr:/opt/IBM/WebSphere/AppServer/profiles/InfoSphere/logs/server1 # tai l -100 SystemErr.log 2. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 3. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorI mpl.java:60) 4. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodA ccessorImpl.java:37) 5. at https://www.doczj.com/doc/ad2829238.html,ng.reflect.Method.invoke(Method.java:611) 6. at https://www.doczj.com/doc/ad2829238.html,uncher.Main.invokeFramework(Main.java:340) 7. at https://www.doczj.com/doc/ad2829238.html,uncher.Main.basicRun(Main.java:282) 8. at https://www.doczj.com/doc/ad2829238.html,uncher.Main.run(Main.java:981) 9. at https://www.doczj.com/doc/ad2829238.html,unchEclipse(WSPreLauncher .java:340) 10. at com.ibm.wsspi.bootstrap.WSPreLauncher.main(WSPreLauncher.java:110 ) 11.Caused by: https://www.doczj.com/doc/ad2829238.html,.ascential.xmeta.repository.core.CoreRepositoryException: Error initializ ing persistence manager module 13. at com.ascential.xmeta.repository.core.impl.DefaultSandbox.(De faultSandbox.java:70) 14. at https://www.doczj.com/doc/ad2829238.html,ng.J9VMInternals.newInstanceImpl(Native Method)

etl教程

ETL本质 做数据仓库系统,ETL是关键的一环。说大了,ETL是数据整合解决方案,说小了,就是倒数据的工具。回忆一下工作这么些年来,处理数据迁移、转换的工作倒还真的不少。但是那些工作基本上是一次性工作或者很小数据量,使用access、DTS或是自己编个小程序搞定。可是在数据仓库系统中,ETL上升到了一定的理论高度,和原来小打小闹的工具使用不同了。究竟什么不同,从名字上就可以看到,人家已经将倒数据的过程分成3个步骤,E、T、L分别代表抽取、转换和装载。 其实ETL过程就是数据流动的过程,从不同的数据源流向不同的目标数据。但在数据仓库中,ETL 有几个特点,一是数据同步,它不是一次性倒完数据就拉到,它是经常性的活动,按照固定周期运行的,甚至现在还有人提出了实时ETL的概念。二是数据量,一般都是巨大的,值得你将数据流动的过程拆分成E、T和L。 现在有很多成熟的工具提供ETL功能,例如datastage、powermart等,且不说他们的好坏。从应用角度来说,ETL的过程其实不是非常复杂,这些工具给数据仓库工程带来和很大的便利性,特别是开发的便利和维护的便利。但另一方面,开发人员容易迷失在这些工具中。举个例子,VB是一种非常简单的语言并且也是非常易用的编程工具,上手特别快,但是真正VB的高手有多少?微软设计的产品通常有个原则是“将使用者当作傻瓜”,在这个原则下,微软的东西确实非常好用,但是对于开发者,如果你自己也将自己当作傻瓜,那就真的傻了。ETL工具也是一样,这些工具为我们提供图形化界面,让我们将主要的精力放在规则上,以期提高开发效率。从使用效果来说,确实使用这些工具能够非常快速地构建一个job来处理某个数据,不过从整体来看,并不见得他的整体效率会高多少。问题主要不是出在工具上,而是在设计、开发人员上。他们迷失在工具中,没有去探求ETL的本质。 可以说这些工具应用了这么长时间,在这么多项目、环境中应用,它必然有它成功之处,它必定体现了ETL的本质。如果我们不透过表面这些工具的简单使用去看它背后蕴涵的思想,最终我们作出来的东西也就是一个个独立的job,将他们整合起来仍然有巨大的工作量。大家都知道“理论与实践相结合”,如果在一个领域有所超越,必须要在理论水平上达到一定的高度 探求ETL本质之一 ETL的过程就是数据流动的过程,从不同异构数据源流向统一的目标数据。其间,数据的抽取、清洗、转换和装载形成串行或并行的过程。ETL的核心还是在于T这个过程,也就是转换,而抽取和装载一般可以作为转换的输入和输出,或者,它们作为一个单独的部件,其复杂度没有转换部件高。和OLTP系统中不同,那里充满这单条记录的insert、update和select等操作,ETL过程一般都是批量操作,例如它的装载多采用批量装载工具,一般都是DBMS系统自身附带的工具,例如Oracle SQLLoader和DB2的autoloader 等。 ETL本身有一些特点,在一些工具中都有体现,下面以datastage和powermart举例来说。 1、静态的ETL单元和动态的ETL单元实例;一次转换指明了某种格式的数据如何格式化成另一种格式的数据,对于数据源的物理形式在设计时可以不用指定,它可以在运行时,当这个ETL单元创建一个实例时才指定。对于静态和动态的ETL单元,Datastage没有严格区分,它的一个Job就是实现这个功能,在早期版本,一个Job同时不能运行两次,所以一个Job相当于一个实例,在后期版本,它支持multiple instances,而且还不是默认选项。Powermart中将这两个概念加以区分,静态的叫做Mapping,动态运行时叫做Session。 2、ETL元数据;元数据是描述数据的数据,他的含义非常广泛,这里仅指ETL的元数据。主要包括每次转换前后的数据结构和转换的规则。ETL元数据还包括形式参数的管理,形式参数的ETL单元定义的参数,相对还有实参,它是运行时指定的参数,实参不在元数据管理范围之内。

主流BI产品对比

国际主流BI产品对比

厂商产品及简介 国际厂商(主要) MicroStrategy MSTR ,国际专业BI 产品,覆盖BI 全部领域 IBM DB2以及Cognos 、SPSS 、DataStage ,覆盖BI 全部领域Oracle BIEE 、Hyperion ,覆盖BI 全部领域,数据挖掘领域有待加强 Microsoft SQLServer ,覆盖BI 全部领域,适合中小型企业,性价比高 SAP BusinessObjects 、CrystalReports 主要是报表领域和数据集成领域 国际BI 市场主要厂商

BI 产品纷纷嫁入豪门: 2007年11月,IBM收购Cognos 2008年4月,Oracle收购Hyperion 2010年10月,SAP收购Business Objects BI 产品国际阵营谁是幸存者: 目前BI产品第一阵营的唯一幸存者只有MicroStrategy,超过20年的专业技术和市场积累,让这个在巨头环伺下的BI行业领军产品一直保持着一枝独秀的良好态势。

厂商名称目标客户群 MicroStrategy金融、电信、政府、石油、电力等高端行业的高端应用,尤 其适合于数据量大,用户分布广泛的行业应用特点 SAP/BO BO定位于SAP ERP的已有用户优先实施,其它则通过OEM或 各种集成商,价格较高,不适用于中小企业 IBM/Cognos通过OEM和集成商进军企业客户,公司本身则注重已有的金 融、电信、政务领域客户 Microsoft适用于中小企业,依靠合作伙伴 Oracle基于Oracle数据库庞大的客户群,注重大型用户,但内部产 品有竞争关系 国际主流BI产品基本都已被IT业界巨头并购,技术路线及商务策略缺乏独立性,除MicroStrategy之外都缺乏BI产品技术发展方向的独立规划。

datastage入门教程

简介 DataStage 使用了Client-Server 架构,服务器端存储所有的项目和元数据,客户端DataStage Designer 为整个ETL 过程提供了一个图形化的开发环境,用所见即所得的方式设计数据的抽取清洗转换整合和加载的过程。Datastage 的可运行单元是Datastage Job ,用户在Designer 中对Datastage Job 的进行设计和开发。 Datastage 中的Job 分为Server Job, Parallel Job 和Mainframe Job ,其中 Mainframe Job 专供大型机上用,常用到的Job 为Server Job 和Parallel Job 。 本文将介绍如何使用Server Job 和Parallel Job 进行ETL 开发。 Server Job 一个Job 就是一个Datastage 的可运行单元。Server Job 是最简单常用的Job 类型,它使用拖拽的方式将基本的设计单元-Stage 拖拽到工作区中,并通过连线的方式代表数据的流向。通过Server Job,可以实现以下功能。 1.定义数据如何抽取 2.定义数据流程 3.定义数据的集合 4.定义数据的转换 5.定义数据的约束条件 6.定义数据的聚载 7.定义数据的写入 Parallel Job Server Job 简单而强大,适合快速开发ETL 流程。Parallel Job 与Server Job 的不同点在于其提供了并行机制,在支持多节点的情况下可以迅速提高数据处理效率。Parallel Job 中包含更多的Stage 并用于不同的需求,每种Stage 使用上的限制也往往大于Server Job。 Sequence Job Sequence Job 用于Job 之间的协同控制,使用图形化的方式来将多个Job 汇集在一起,并指定了Job 之间的执行顺序,逻辑关系和出错处理等。 数据源的连接 DataStage 能够直接连接非常多的数据源,应用范围非常大,可连接的数据源包括: ?文本文件 ?XML 文件

datastage入门培训

一、工具入门 DataStage是一个ETL的工具,就是对数据的抽取,转换,加载。个人通俗的理解就是一个对数据进行处理,提取的工具,这里面的数据大部分是以数据库中表的格式存在着的,所以如果要使用这个工具,首先必须对关系数据库的一些基本概念要有所了解,比如最基本的字段,键,记录等概念。 DataStage是通过设计job来实现ETL的功能的。 Job的设计跟普通的IDE设计一样,通过拖拽控件,并填加脚本来完成。这里的控件称为stage,每一个不同的stage都有不同的数据处理的功能,将各个stage通过一定的方式组合起来,设计成job,对job进行编译,运行,就能够实现对数据抽取转换加载。 1,安装datastage,看学习指导,先对该工具有个大概的认识,大概知道administrator,design,director,manager的区别。 了解datastage工具的主要用途:简单的说就是把一批数据input进来,经过各种各样的转化,清洗,然后在output出去,整个就是ETL 的过程。 对4个工具我们最常做的操作有: Administrator:1、对Project的管理,主要是建立和删除project; 2、对Licensing的管理,主要是更换Licensing。 design:datastage的核心,所有的开发都在design里面完成,在这里可以编辑你的job,使用各种stage控件。 director:1、查看日志,当运行job结束时,无论job成功或者失败,我们都可以在director 里面查看日志,里面能反映我们job运行的状态,经常job出错我们都是先查看日志,然后分析原因,再到design里面修改。 2、director的另外一个很有用的功能是logout job,当服务器或者网络出问题时,正在编辑的job很有可能被锁定,这时你就算把design关了再重新登陆还是无法打开job,会提示job has been used, 这就需要到director里面把job logout,然后就可以使用了。manage:manage的最主要的功能是可以对design里面的资源进行导入导出,当我们要把开发的job从一台机器转移到另外一台机器时,就需要用到。 二、开始学习使用design,做一些简单的job,接触几个常用的stage。 做练习1的1-2至4-2的练习,练习中用到的Oracle组件全部用sequence file 代替, 1-2练习中会教你导入练习所要用到的表的结构,练习中要用到的数据文件放在数据及表定义目录下。(表定义可以通过manage工具导入,但是数据文件必须自己手工导入,所以开发前请先将数据及表定义目录下面的所有.txt的数据文件导到你所使用的datastage的开发环境上,导数据文件的方法可以使用ftp工具) 要设计job的关键,就在于能够熟悉每个不同的stage并且能够灵活运用。在文档和指导中有对每个控件的使用方法作了图文并茂的说明,但是教材语言的一个缺点就是太过形式化,所以有些概念不能够很好的理解。比如lookup这个stage我在看教材的时候就没有太了解。所以,我就结合自己,用自己的语言对一些比较常用的stage说一下自己的理解和一些需要注意的地方。 几个常用stage的经验总结: Sequential File Stage:这个控件实际上是指代主机上面的一个文件,在它的属性中可以选定文件的路径,目录。一般这些文件都是以类似数据库表的格式存在的。使用这个控

datastage入门教程

简介 DataStage 使用了 Client-Server 架构,服务器端存储所有的项目和元数据,客户端 DataStage Designer 为整个 ETL 过程提供了一个图形化的开发环境,用所见即所得的方式设计数据的抽取清洗转换整合和加载的过程。Datastage 的可运行单元是 Datastage Job ,用户在 Designer 中对 Datastage Job 的进行设计和开发。Datastage 中的 Job 分为 Server Job, Parallel Job 和 Mainframe Job ,其中 Mainframe Job 专供大型机上用,常用到的 Job 为Server Job 和 Parallel Job 。本文将介绍如何使用 Server Job 和 Parallel Job 进行 ETL 开发。 Server Job 一个 Job 就是一个 Datastage 的可运行单元。Server Job 是最简单常用的Job 类型,它使用拖拽的方式将基本的设计单元 -Stage 拖拽到工作区中,并通过连线的方式代表数据的流向。通过 Server Job,可以实现以下功能。 1.定义数据如何抽取 2.定义数据流程 3.定义数据的集合 4.定义数据的转换 5.定义数据的约束条件 6.定义数据的聚载 7.定义数据的写入 Parallel Job Server Job 简单而强大,适合快速开发 ETL 流程。Parallel Job 与 Server Job 的不同点在于其提供了并行机制,在支持多节点的情况下可以迅速提高数据处理效率。Parallel Job 中包含更多的 Stage 并用于不同的需求,每种 Stage 使用上的限制也往往大于 Server Job。 Sequence Job Sequence Job 用于 Job 之间的协同控制,使用图形化的方式来将多个 Job 汇集在一起,并指定了 Job 之间的执行顺序,逻辑关系和出错处理等。 数据源的连接 DataStage 能够直接连接非常多的数据源,应用围非常大,可连接的数据源包括:

datastage教程

1、【第一章】datastage简介与工作原理 1、简介 数据中心(数据仓库)中的数据来自于多种业务数据源,这些数据源可能是不同硬件平台上,使用不同的操作系统,数据模型也相差很远,因而数据以不同的方式存在不同的数据库中。如何获取并向数据中心(数据仓库)加载这些数据量大、种类多的数据,已成为建立数据中心(数据仓库)所面临的一个关键问题。针对目前系统的数据来源复杂,而且分析应用尚未成型的现状,专业的数据抽取、转换和装载工具DataStage是最好的选择。 Websphere DataStage 是一套专门对多种操作数据源的数据抽取、转换和维护过程进行简化和自动化,并将其输入数据集市或数据中心(数据仓库)目标数据库的集成工具。 DataStage 能够处理多种数据源的数据,包括主机系统的大型数据库、开放系统上的关系数据库和普通的文件系统等,以下列出它所能处理的主要 数据源: 大型主机系统数据库:IMS,DB2,ADABAS,VSAM 等 开放系统的关系数据库:Informix,Oracle,Sybase,DB2,Microsoft SQL Server等ERP 系统:SAP/R3,PeopleSoft系统等,普通文件和复杂文件系统,FTP 文件系统,XML等IIS,Netscape,Apache等Web服务器系统Outlook等Email系统。 DataStage 可以从多个不同的业务系统中,从多个平台的数据源中抽取数据,完成转换和清洗,装载到各种系统里面。其中每步都可以在图形化工具里完成,同样可以灵活的被外部系统调度,提供专门的设计工具来设计转换规则和清洗规则等,实现了增量抽取、任务调度等多种复杂而实用的功能。其中简单的数据转换可以通过在界面上拖拉操作和调用一些DataStage 预定义转换函数来实现,复杂转换可以通过编写脚本或结合其他语言的扩展来实现,并且DataStage 提供调试环境,可以极大提高开发和调试抽取、转换程序的效率。

大数据处理综合处理服务平台的设计实现分析范文

大数据处理综合处理服务平台的设计与实现 (广州城市职业学院广东广州510405) 摘要:在信息技术高速发展的今天,金融业面临的竞争日趋激烈,信息的高度共享和数据的安全可靠是系统建设中优先考虑的问题。大数据综合处理服务平台支持灵活构建面向数据仓库、实现批量作业的原子化、参数化、操作简单化、流程可控化,并提供灵活、可自定义的程序接口,具有良好的可扩展性。该服务平台以SOA为基础,采用云计算的体系架构,整合多种ETL技术和不同的ETL工具,具有统一、高效、可拓展性。该系统整合金融机构的客户、合约、交易、财务、产品等主要业务数据,提供客户视图、客户关系管理、营销管理、财务分析、质量监控、风险预警、业务流程等功能模块。该研究与设计打破跨国厂商在金融软件方面的垄断地位,促进传统优势企业走新型信息化道路,充分实现了“资源共享、低投入、低消耗、低排放和高效率”,值得大力发展和推广。 关键词:面向金融,大数据,综合处理服务平台。 一、研究的意义 目前,全球IT行业讨论最多的两个议题,一个是大数据分析“Big Data”,一个是云计算“Cloud Computing”。中

国五大国有商业银行发展至今,积累了海量的业务数据,同时还不断的从外界收集数据。据IDC(国际数据公司)预测,用于云计算服务上的支出在接下来的5 年间可能会出现3 倍的增长,占据IT支出增长总量中25%的份额。目前企业的各种业务系统中数据从GB、TB到PB量级呈海量急速增长,相应的存储方式也从单机存储转变为网络存储。传统的信息处理技术和手段,如数据库技术往往只能单纯实现数据的录入、查询、统计等较低层次的功能,无法充分利用和及时更新海量数据,更难以进行综合研究,中国的金融行业也不例外。中国五大国有商业银行发展至今,积累了海量的业务数据,同时还不断的从外界收集数据。通过对不同来源,不同历史阶段的数据进行分析,银行可以甄别有价值潜力的客户群和发现未来金融市场的发展趋势,针对目标客户群的特点和金融市场的需求来研发有竞争力的理财产品。所以,银行对海量数据分析的需求是尤为迫切的。再有,在信息技术高速发展的今天,金融业面临的竞争日趋激烈,信息的高度共享和数据的安全可靠是系统建设中优先考虑的问题。随着国内银行业竞争的加剧,五大国有商业银行不断深化以客户为中心,以优质业务为核心的经营理念,这对银行自身系统的不断完善提出了更高的要求。而“云计算”技术的推出,将成为银行增强数据的安全性和加快信息共享的速度,提高服务质量、降低成本和赢得竞争优势的一大选择。

Datastage 培训资料

Datastage培训 1.什么是Datastage? 设计jobs 抽取(Extraction)、转换(Transformation)、装载(Loading)即ETL 数据整合项目工具,如数据仓库、数据集市和系统移植。 DataStage的框架,如图-1: 图-1 在开发过程中是通过DataStage的四个客户端(DataStage Administrator如图-2, DataStage Manager如图-3, DataStage Designer如图-4, DataStage Director如图-5) 来进行工作的。 图-2 图-3 图-4 图-5 DataStage的基本开发流程: 1.在Administrator中新建工程、定义全局和工程属性 2.在Manager中导入元数据 3.在Designer中定义job 4.在Designer中编译job 5. 在Director中验证,运行,监控job 2.DataStage Administrator介绍 主要功能:对server进行一些常规的设置、用来执行管理任务,如建立DataStage用

户、新建和删除工程,设置工程的属性。 2.1.登陆 登陆后的界面: 在General标签中、可以看到当前server的版本是7.5.1.A,你也可以点击”NLS…”选择Client端的默认字符集。 2.2.新建工程 选择Projects标签,

在这里你可以选择Add按钮来新建一个工程“sjzh”如图: 该工程存放的目录为“/home/dsadm/Ascential/DataStage/Projects/sjzh”在这里我们选择系统的默认路径。选择“OK”就新建了一个工程,如图:

Datastage入门示例

Datastage介绍及示例 1 Datastage 简介 Datastage包含四大部件:Administrator、Manager、Designer、Director。 1.用DataStage Administrator 新建或者删除项目,设置项目的公共属性,比如权限。 2.用DataStage Designer 连接到指定的项目上进行Job的设计; 3.用DataStage Director 负责job的运行,监控等。例如设置设计好的job的调度时间。 4.用DataStage Manager 进行Job的备份等job的管理工作。 2 设计一个JOB示例 2.1 环境准备 目标:将源表中数据调度到目标表中去。 1 数据库:posuser/posuser@WHORADB , ip: 192.168.100.88 2 源表:a_test_from 3 目标表:a_test_to 两者表结构一样,代码参考: create table A_TEST_FROM ( ID INTEGER not null, CR_SHOP_NO CHAR(15), SHOP_NAME VARCHAR2(80), SHOP_TEL CHAR(20), YEAR_INCOME NUMBER(16,2), SHOP_CLOSE_DATE DATE, SHOP_OPEN_DATE DATE ); alter table A_TEST_FROM add constraint TEST primary key (ID); 4. 示例数据: insert into A_TEST_FROM (ID, CR_SHOP_NO, SHOP_NAME, SHOP_TEL, YEAR_INCOME, SHOP_CLOSE_DATE, SHOP_OPEN_DATE) values (24402, '105420580990038', '宜昌市云集门诊部', '82714596 ', 1000, to_date('01-05-2008', 'dd-mm-yyyy'), to_date('01-06-2008', 'dd-mm-yyyy')); insert into A_TEST_FROM (ID, CR_SHOP_NO, SHOP_NAME, SHOP_TEL, YEAR_INCOME, SHOP_CLOSE_DATE, SHOP_OPEN_DATE)

Datastage控件使用指南

Datastage 控件使用指南

目录 1. 引言 (1) 2. 常用STAGE使用说明 (1) 2.1.S EQUENTIAL F ILE S TAGE (1) 2.2.A NNOTATION (4) 2.3.C OLUMN E XPORT S TAGE (5) 2.4.C HANGE C APTURE S TAGE (7) 2.5.C OPY S TAGE (9) 2.6.F ILTER S TAGE (10) 2.7.F UNNEL S TAGE (11) 2.8.T ANSFORMER S TAGE (12) 2.9.S ORT S TAGE (13) 2.10.L OOK U P S TAGE (14) 2.11.J OIN S TAGE (14) 2.12.M ERGE S TAGE (16) 2.13.M ODIFY S TAGE (17) 2.14.D ATA S ET S TAGE (18) 2.15.F ILE S ET S TAGE (19) 2.16.L OOKUP F ILE S ET S TAGE (21) 2.17.O RACLE E NTERPRISE S TAGE (23) 2.18.A GGREGATOR S TAGE (24) 2.19.R EMOVE D UPLICATES S TAGE (26) 2.20.C OMPRESS S TAGE (27) 2.21.E XPAND S TAGE (28) 2.22.D IFFERENCE S TAGE (29) 2.23.C OMPARE S TAGE (31) 2.24.S WITCH S TAGE (32) 2.25.C OLUMN I MPORT S TAGE (33) 3. DATASTAGE MANAGER使用 (35) 3.1.导入导出J OB及其它组件 (35) 3.2.管理配置文件 (37) 4. DATASTAGE ADMINISTRATOR常用配置 (39) 4.1.设置T IME O UT时间 (39) 4.2.设置P ROJECT的属性 (40) 4.3.更新D ATA S TAGE S ERVER的L ICENSE和本地C LIENT的L ICENSE (41) 5. DATASTAGE DIRECTOR使用 (41) 5.1.察看J OB的状态,运行已经编译好的J OB (41) 5.2.将编译好的J OB加入计划任务 (44) 5.3.监控J OB的运行情况 (45)

有效教学视域下小学英语利用数字教育资源的模式

有效教学视域下小学英语利用数字教育资源的模式 发表时间:2019-06-24T11:00:16.000Z 来源:《成功》2019年第2期作者:周雪梅 [导读] 英语是小学教学中一门相对较新的学科,和语文数学不同,由于其诞生时期较晚,在教学模式并没有如语文或者数学一般有一套完善且行之有效的教学模式。本文探讨的方向,是数字教育资源在小学英语教学中而对运用,笔者将通过对信息技术在英语教学中的应用分析提出符合“有效教学”要求的教学模式。 南充市嘉陵区火花三小四川南充 637900 【摘要】英语是小学教学中一门相对较新的学科,和语文数学不同,由于其诞生时期较晚,在教学模式并没有如语文或者数学一般有一套完善且行之有效的教学模式。本文探讨的方向,是数字教育资源在小学英语教学中而对运用,笔者将通过对信息技术在英语教学中的应用分析提出符合“有效教学”要求的教学模式。 【关键词】有效教学;小学英语;数字教育 一、英语的学习难点及数字资源对英语教学的帮助 学习英语的一大难处在于英语本身并非学生的母语,学生在学习过程中实际是对第二语言的钻研的了解,因为缺少先天性的耳需目染,要熟练掌握英语显然不是件容易的事情[1]。 如何用一种高效的教学方法,打破空间和时间的限制,让学生随时处在教学熏陶中?这成为教学工作者着重思考的问题。现代社会是信息时代,网络技术的发达使得人们在交流上更加便利,也更加顺畅,如果使用网络工具进行信息交流和共享,在一定程度上能教师的教学深度扩展化。 此外,物联网、计算机、手机这些我们重用的工具都可以作为使用和利用数字资源的媒介,教师需要调动数字资源投入教学本身也十分便利。具备如此强大的包容力和广泛性的资源如果能有效的融入教学领域中,势必能发挥出优秀的教学价值[2]。 二、数字资源在小学英语教学中的教学模式组成 “有效教学”本身有一个极大的限定概念,即教学的内容必须在有限时间以及空间内完成,这要求有效教学视域下的教学方式必须保持紧密的合理性与恰当性,教师要保持学生对活动的参与度和学习意识,能够积极融入活动中,和自己展开高效的互动。北京师范大学外语教授王蔷提道:“英语学科的核心素养是语言、思维、文化和学习四方面构成。”英语的教学不应当停留在简单的知识填充,而是要深入的挖掘蕴藏在语种中的语言习惯、思维方式、文化底蕴,最终才能真正学习英语,吃透英语。因此应用数字资源创设出符合让学生正确学习英语的教学模式,是教学工作者要认真思考并施行的教学任务。 (一)数字资源聚合方法体系 数字资源聚合方法体系是集合数字资源的重要方法,对教师来说,在教学过程中实现数字资源聚合是有效在教学中运用数字资源的前提,因此,教师在制定体系的结构时,需要从总体上保证数字资源聚合各种功能的实现,功能和结构间存在对应性,存在契合性是构建聚合资源体系的重要基础。因为信息领域的广泛,数字资源聚合方法的来源是多方面、多角度的、数字资源聚合体系的构架方法有众多的类别、性质也是多元、多角度的,教师在创建数字资源运用架构时,必须要设置出对应的层次表,以此才能保证资源的切实运用,具体如表1。 知识关联层(深度语义)本体语义丰富、形式化程度高;扩展性、时效性差 关联数据多领域资源聚合,扩展性强;封闭系统的资源无法处 概念关联层 (语义增强)主题词表结构化、形式化程度高、可复用、自由度低、时效性 分众分类法易用,体现群体智慧;受控性,规范性和结构性差 文献计量(内容特征)处理数据量大,维度高,语义关联度 概念聚类层文献计量(外部特征) 社会网络分析多维,多层级;全面性差、自由度低、聚合过程复杂 表1数字资源聚合方法层次框架 数字资源的体系结构应该是自下而上,有明确的概念聚类层、概念关联层以及知识关联层,整个模式的划分应当清晰细致且具备动态演化性,在教学过程中可以自由的转变,知识粒度根据教学需求自行调整,包括语义化的程度、形态也在语法、语用间随意转化,使不同的知识单元间建立起多元化关联。 (二)数字资源的描述、聚合及可视化 数字资源的数据渠道包含不同的来源、不同的类型,因此在数字资源模式中存在如存储分布、异质异构等特征,教师在收集和运用数字资源时要注意其可靠性,来源必须依托在真实的对象或环境上,且在时间标准上要秉承近期优先的原则,在缺少相应的教学所需资源的情况下再考虑回溯收集,以此保证教学过程中数字资源的新鲜性和时效性。在条件允许的状况下,教师可以使用Datastage作为数字资源的储存库,对信息进行自由的抽取和转换,用ICTCLAS等软件工具对文档资源进行对应分词,将其和英语专业的词典结合,从数字资源中提取到实体信息和特征向量,将最有用的内容抽取出来,在投入到教学中。教师应当在本体和外覆包上建立数据模型,让手中的数字资源和教学保持通用性,做到把复杂抽象的知识点可视化,用视觉方式呈现数字资源的运用模式以及使用结果。将数字教学资源中的节点元素之间错综复杂的链接关系和语义关联以直观形象展现到教学视域中,以此保障学生充分发挥直观认知和理解能力,发现教学数据后的隐藏知识,将教师想要传递的内容最大化的吸收进大脑内。可视化结果需要符合学生的视觉习惯,还应能根据数据资源重要程度设定视觉距离和表现形式。而最重要的地方在于,教师要在教学中提供与学生的交互机会,让学生可以根据结果进一步改变融入数字教学的过程,通过亲身体验,亲身经历,最大化的将数字内容进行知识“变现”,把信息节点转化为知识、载体等实体元素,让自己的英语能力和英语词汇储备得到增加。 通过数字资源及工具的运用,教师可以有效调动起小学生对于英语学习的兴趣,这样一来小学生的基础性英语听说能力能够在学习中

DataStage优化培训笔记.doc

DataStage优化培训笔记 Sequential file 1、注意reject mode的设置 2、优化:(在文件定长的前提下) number of readers per node 设定单节点的多个读取,根据实际情况设置多读个数 read from multiple nodes 设定多节点的数据读取 Change Capture Stage 比较数据后会进行排序,如果之前的数据已经做了排序,则需要改变排序属性。 注意before 和after 的设置,不要设反。 Copy Stage 在内存中操作的组件,建议1进多出用copy组件 Tansformer Stage 是内嵌的程序,一旦作业执行到此stage 程序会暂停进程,外部调用so的程序,Transformer组件中包含的函数,可以自己编写函数进行嵌入(通过routine实现) filter不能用于复杂的判断,copy不能增加赋默认值的字段.. Sort Stage 尽量不用,属于滞留组件,要等数据齐全后再能进行sort操作 LookUp和Join的区别需要注意 join一定要进行排序再进行处理(效率较低), LookUp是流水线实现(超过800M不能用此stage) Data Set Stage Stage自动设置数据为定长,实现多值读取,可以通过drop on input来限制输入数据。 生产环境优化: 关注CPU(并发路数,逻辑节点数,物理作业数),内存,I/O交互 1、在Oracle Enterprise 中使用select语句时,提取尽量少的字段数据 2、在使用LookUp Stage时,如果数据从Oralce出来的,在LookUp table(参照表中)可以设置Lookup type=sparse(此方式是数据不提取到内存,直接在表中进行操作) 3、在Oracle Enterprise中设置Partition table="需要查询的表名"可以实现多进程读取数据 4、在文件系统中,为平衡节点负载,建议数据的输入和输出放在不同的磁盘上(可通过节点进行设置,如Sequential_File中设置FILE的路径) 5、尽量少用repartition(sort stage 、join stage等组件需要对数据进行repartition) 6、要保证有足够的scratch空间,当此空间满了之后,系统会把数据转移到tmp空间,效率变低 7、网络瓶颈会影响作业效率(局域网通讯,Node之间的通讯问题) 8、在MAIN机器上,设置是否关闭jobmonitor进程(pools"" 为默认节点,需要进行节点运

相关主题
文本预览
相关文档 最新文档