Update MAY My blog on DataFlux below, has been visited many times Would it possible for you to direct me to some basic tutorials. Watch this data quality video for a demonstration of DataFlux's product data quality software and get a DataFlux tutorial on using their.

My blog on DataFlux below, has been visited many times a day by visitors all over the world, but readers were often disappointed when they learned that DataFlux is now fully integrated with SAS; although that provides a great feature set, it also makes the software much more complicated, more expensive, and in a few words not as easy to use as it once was.

Well, today I share some great news. So what is DataFlux? In this example, a human could probably pretty easily figure out that the first two Victors are probably one and the same and that Bill in SLO and William in San Luis Obispo are also the same person. Furthermore, it is obvious that some data inconsistencies exist such as name prefixes and suffixes, inconsistent casing, incomplete address data, etc. This interface is new in version 8 and helps provide quick access to the functions one would use most often.

This change is actually helpful as opposed to some GUI changes made by companies by combining a lot of the settings into a central place.

Profiling — Most nodes here help provide a synopsis of the data being processed.

DataFlux Data Management Studio: Essentials – Ultramax | IT Training | SAS | REDHAT | QAI

Here the tktorial of profiling can be linked to other actions. Gender Analysis determine gender based on a name fieldidentification analysis e. Enrichment — As the name suggests, these nodes help enrich data, i. Monitoring — Allows for action to take place on a data trigger, e. Either way, the data will appear in my preview window instant gratification is one of the great things about Architect.

I want to point out just two things here:. This is likely to have happened because too many fields are wrong dafaflux the USPS data verification system is designed not to guess too much….

If I would have used that, the correct Zip-4 would have been calculated. This is because the USPS system recognizes as an address within a correct range. Nonetheless, pretty neat, eh? For this reason you see datatlux here.

After that, I can preview as before. Note how well DataFlux picked out the first, middle and last names, not to mention the prefixes and suffixes. There are many options to choose from including things like: You might be wondering how DataFlux does this.

The answer here is yes. DataFlux utilizes several algorithms and known last names, first names, etc. By that I mean that the placement of a comma in a name greatly enhances the parser ability to determine the location of the last name.


For example, often times perhaps most of the timenothing can be done about data in a system once it is entered.

This step is important because intelligent parsing, name lookups, etc. Here you can see that match codes ignore minor spelling differences, take into account abbreviations, nicknames, etc. Why is this so significant? We now have an easy way to find duplicates! Match codes could be datadlux in a database and allow quick checks for duplicates!

Note that the cluster numbers are the same for records that match, based on the clustering conditions I set a moment ago. Well because of the clustering conditions I set. There are a lot of really neat things that DataFlux can do. This is really fantastic. This is very much useful for beginners like me. Please clarify me more on the following points:. The new Data Management Platform is perfect for what you are looking for. In DataFlux, at run-time, embedded jobs are basically embedded into the parent.

Let me know if I addressed your issue… if not, please provide more detail. I sent this question off to the DataFlux team, and they sounded appreciative, but never responded with any information — sorry. Most of what I have is specific to what my company is trying to accomplish. What are you looking for?

I am just trying to understand how to write Dataflux scripts. I want to see some samples or a procedure to write Dataflux scripts. Which language to use for coding expressions within Utilities Data Validation or generally within DataFlux? These are certainly not Base SAS. In some ways it looks a bit like VB and in some ways a little like Tutoial.

I found that looking at the Expression Reference in the Help files was sufficient for me to figure out anything I needed to code.

Hi victor, I wanted to customize my QKB for changing my all address data in database where i wanted the address field to be only Hi Guys, Anyone is having dataflux beginners material?. Just now i have entered into a project where we are using dataflux. I have no idea about dataflux could anyone help me?. If anyone is having material could you please send to this mail id sp. I need more details in dataflux.

I somehow missed your comment. There are a couple of recorded demos on the DataFlux portal that may be helpful. Hi Victor, I am not able tutodial get those two topics under Webcast. I am able to find the webcast link and not the webcast demo.

Could you help me to get the link? Would I have to run through all the addresses through verification, clustering to ensure that the latest change gets picked up?

DataFlux Data Management Studio: Essentials

This has to be obtained from a 3rd party vendor such as InfoUSA. If there is a change in the zip code you would run the addresses through address verification again.


You could cluster if you suspect that it might break up clusters or create new ones, but this step would be optional. If you had calculated area codes for phone numbers you could redo the calculation in a similar fashion.

Tutoiral will send me more details in the coming days. Thank you for taking the time. Please do let me know when the NCOA support happens and how to use tytorial.

I have millions of data to load but it is taking forever…. Regarding the DB2 insert rate, tutogial you already datarlux changing the commit interval? Is the database co-located or is it remote? The database is on remote AIX server and dataflux on Windows. Regarding the DB2 insert, how much did you increase the interval? I would recommend a value of 10, Look at the Data Direct driver documentation to find out more about other options.

I will investigate more on the bulk load. Is there any way we could run the nodes faster taking out the Database load portion of it? How are you running the DF job?

Is it running on a client machine Windows? Anytime my company processes that large of a load it uses a server environment.

What is DataFlux? | Victor Fehlberg’s Tech Postings

Perhaps you should consider using a more powerful platform as well. Perhaps you could get a day trial license from DF to see if that would indeed solve your problems. I am wondering that if by chance you have used dataflux Architect jobs with command line execution.

Hi Victor Really nice and useful post. Have you come up with the blog you have mentioned about where you have posted some dfArchitect. Really keen to go through the same. Am new to Dataflux and need to use it for some data qa.

Would it possible for tugorial to direct me to some basic tutorials. You could also e-mail it dataf,ux me in case you have some pdfs. Am new to dataflux and am trying to figure my way out. I would like to know how dwtaflux attach a QKB. As of now am not able to run ddataflux job since it says there is not qkb being specified. Alsoi am not able to find any locales from drop down list using gender analysis node.

You first need to download and install a QKB. Have you done so? What version of DF are you using? Also quick questions with regards to attaching a database. I need to analyse table on an oracle db.

