SSTS Blog

Some news and tidbits that we share

paste

by SSTS
SSTS
Server Side Technology Solutions is a consulting firm that specializes in databa
User is currently offline
on Dec 20 in Blog 0 Comments

Learned something new recently while having the task of manipulating large files from a data provider. You see, I needed to gather several years of financial information for several thousand companies (at the daily level). There were about 30 attributes so this process needed to be run about 30 times with 30 resulting files. After it was done it...

Learned something new recently while having the task of manipulating large files from a data provider. You see, I needed to gather several years of financial information for several thousand companies (at the daily level). There were about 30 attributes so this process needed to be run about 30 times with 30 resulting files. After it was done it all needed to be loaded into a database. I could have loaded the files individually, but in the end all the data needed to be joined. I actually tried loading it all into the database and let the database doe the join, however these were such large data sets that the memory required to do such a join was larger than I had. I had a "wouldn't it be great" moment wondering if there was a way to join the files together in a streaming fashion. The order of the lineswas such that line 1 of each file could be joined together in a consistent way (the all belonged to the same security and the same date)
I was able to run the follwing command after placing all cvs files in a directory:
paste -d ',' *csv > all_data.csv
After that I had one monster file ready to load to the database - pre-joined and all!

Read more http://billennis-ssts.blogspot.com/2007/12/paste.html

Tags: Untagged
Hits: 4320

About the author

SSTS

Server Side Technology Solutions is a consulting firm that specializes in database design, development and support.

Comments

Please login first in order for you to submit comments