Tuesday, September 13, 2011

what is the different between lookup and join in ETL

lookup and join are very different terminology, which have different funtionality, approach and technology
1 lookup is a 1 to many relationship, join can be any relationship like 1 to many or many to many
2 nomally lookup is left join ( assuming  master table is left), join can implemente left join, right join, inner join, outer join
3lookup can cache into memory, but join may be not
4 regarding lookup, a few tools do not need sort, because it can locate the record by lookup key, which similar with hash index, but join includes merge join and hash join, if we use merge sort to build the data warehouse, it will use a large amount of resource, becuase the data will be sorted and then do join, if there is a large amount of data, after joinning a few time, the performance will be a big problem.
normally the most of ETL tools implemente sort merge join.  so the most of ETL tool recommended to use lookup instead of join