etl - Programmatic data conversion strategy -


i have product imports data files clients (ie: user directories, etc), , export other types of data (ie: reports, etc). import , exports in in csv format (rfc4180), , files passed , forth through managed file transfers.

increasingly, i'm seeing requests clients transform , reconfigure these data files use in legacy systems. import data files, it's bizarre requests like:

"we're passing 20 columns, apply $business_logic columns 4,7,5,18,19 determine actual value system needs in column 21, drop original columns cuz aren't useful themselves"

or

"the value in column 2 padded zeros, please strip off."

for data exports files, it's requests like:

"you sending .csv, need in our special fixed width format."

or

"you formatting numbers decimals. remove those, , prefix 8 zeros."

of course, every client onboard has different requirements. i'm hesitant dive in , write scratch imagine there sorts of gotchas along way in building out files of different formats (csv, tsv, fixed width, excel, stone tablets), , dealing character encoding, etc, etc. i'm looking sort of dev framework (or commercial product) allow satisfy increasing number of (and variety of) data transformation requests. lightweight & simple preferred.

any thoughts or experiences appreciated.

i'm not sure if it's total fit can check out streamsets.com

it's open-source tool data movement , lightweight transformations. allows provide minimal input schema (e.g. have csv files) don't have deal lot of things mentioned.

*full disclosure i'm engineer @ streamsets


Comments

Popular posts from this blog

css - Which browser returns the correct result for getBoundingClientRect of an SVG element? -

gcc - Calling fftR4() in c from assembly -

.htaccess - Matching full URL in RewriteCond -