Unstructured Data: Creating the Analytical Environment
Venue: To be advised
Location: London, United Kingdom
Event Date/Time: Dec 04, 2008 | End Date/Time: Dec 05, 2008 |
Description
With DW 2.0 the idea arose that unstructured data is best placed in a data warehouse, where it can be analyzed along with other structured data found in the data warehouse.
This seminar is about the work that needs to be done in order to take textual data out of the confines of documents and integrate the textual data into a data warehouse. This is a very down to earth seminar/work shop. The first day is a lecture based on the background material needed to understand the architecture surrounding the placement of text in an analytical, data warehouse environment. The second day is a workshop that shows – step by step – how text is converted into a data base that can then be placed into a data warehouse.
There is a big difference between searching text and analyzing text. The seminar brings out these important distinctions.
The hardest part of transforming text into a data warehouse is the integration of the text. Anyone can read a text file and toss the text into a data base. Such an exercise is an exercise in futility. The resulting data base is one that cannot be usefully processed by a BI tool. In order to produce a meaningful result, the analyst must carefully transform the text. Some of the basic issues of transformation include:
reading and understanding semi structured data
applying external categories to text
creating internal taxonomies of text
standardizing dates for BI processing
identifying patterned variables
identifying named variables
resolving homographs, and so forth.
There is a special emphasis on the management of corporate contracts and oil and gas pipeline and refinery safety data in this seminar.