Following on from part 1 of this blog – http://www.redstk.com/using-bics-datasync-to-load-data-part-1/ – where I covered installing and initial configuration of the DataSync utility, I’m now going to run through the steps to set-up a simple data transfer to move some data into my BI Cloud database.
The first step in the DataSync client is to click on the Projects button in the top menu bar. This opens the projects window which at present is empty.
An important point to remember in this screen is to check the project you are working on. A project is a suite of tables/files to upload in a single session. You may well need multiple projects to populate a BI Cloud database – either if loading data from multiple sources or if you need to schedule different tables at different times – may be some tables are updated daily, others weekly for example.
In the menu bar at the top of the screen is a drop-down list of projects that have been created
At present I only have one – ‘Pauls BICS DataSync Project’ – you can create more from the menu under File -> Projects.
For this project I have a set of tables in a local Oracle database which I need to transfer to my BI Cloud database and in part 1 of this blog I’d set-up a connection to this database – ‘BI Datawarehouse’
So in the tabs across the top of the upper window I need the ‘Relational Data’ tab.
I can now click on ‘Data From Table’ to import the table definitions. The following window pops up allowing us to search for and then select tables. I’ve only got the one source connection, so it defaults to that. In the table name file I could leave the asterisk and view all tables, or filter them down.
For this example I’m using the Oracle SampleApp database, so I’m going to filter on tables starting with SAMP. After clicking on ‘Search Tables’ I get the list of available tables to choose from.
It’s a little annoying that you can’t resize this window to cope with a long list of tables, but I can now select the tables I want – just nine of them for now.
Next click ‘Import Tables’. After a few seconds the tables appear in the project window:
There are a few settings we can apply to each table at this stage.
Firstly we can specify the exact query to use against the table – the default will be a simple ‘select * from…’, but with the query you can add where clauses or any other valid select syntax. A common use of this is to restrict the data to the most recently updated records to enable incremental uploading of data – a simple example might be “select * from table_name where update_date > sysdate-1″, to only transfer data loaded since yesterday.
Next you can specify the load strategy – we’ll look at this below. You can also specify the maximum number of permitted errors before the transfer will be aborted and you can temporarily disable individual tables from the transfer.
Clicking on the Load Strategy field opens up the following window providing options for loading data into each table in the project.
The default ‘Replace data in table’ will perform a truncate and resend all data from the table (or from the query if one was entered).
Append will insert records without deleting or check for the existence of existing records – this will generate errors if keys prevent duplicate records, which you can override with the max number of errors field if you desire.
The update table option uses keys on the table to either add new records or update existing records and you can specify to only add new records, only update existing records, or both.
If you are using the update table option you then need to specify which fields make up the keys for identifying update records. To do this click on the table in the upper project window. In the lower detail table there are two tabs. The first tab, ‘Edit’ just displays the same fields as on the upper project. The second tab ‘Table Column Mapping’ lists the fields on the table and the ‘Update Rows on Match’ column let you choose which fields to use:
Once you’ve finished setting the details for each table in the project click the ‘Save’ button in the bar across the top of the project window.
So now we are ready to run our project – to transfer the data in these tables between the source database and the BI Cloud database.
We do this via “Jobs”. A job is an instance of running a particular project and can be associated with a schedule to regularly run it. Clicking on the Jobs button in the top menu bar opens the Jobs window:
Each project will have one Job automatically created – with the suffix ‘Job1’, although this can be renamed. To run the job immediately click the ‘Run Job’ button.
You get an ‘are you sure’ window and then a message stating that the job has been submitted.
Click on the ‘Current Jobs’ tab to see the job progress. The bottom tab provides more detail.
Click on the Tasks tab to see its progress through each table:
Each table goes through phases of being queued (in yellow), running (blue) and finally completed (green). There is a fourth phase – failed (red), but that hasn’t happened here. The DataSync utility can multi-task – hence why above four tasks are running at the same time..
Once each table is complete the window displays the number of successful and failed rows.
The Job History tab will display a list of all the times each project has been run and right now will just show the one successful record:
You can manually run a job as often as you like, simply return to the Jobs tab and click Run Job again.
Alternatively you can schedule the job to run automatically using the Job Schedules tab.
Clicking on the new button opens up a new line. In the bottom window you enter the details of the schedule – a name for it, the job to run and then the frequency details. Here I’ve set-up my job to run daily at 10pm from today onwards:
Obviously the client does not need to be open for the schedule to run – the DataSync agent running in the background will do all the work. The DataSync agent must of course be running – make sure it gets restarted whenever the server is rebooted!
So there we are – a quick overview of using DataSync to load data into a BI Cloud database. – Enjoy!