Wednesday, February 27, 2008

Storing large files

Usually data storage is the province of the IT department, but sometimes a test engineer has to get involved. Here's a good example:

A few years ago I started work on a system that would take pictures of lit LEDs at the wafer level. It would analyze the image and save analysis data to the database. But we also had a need to save the image itself.

The images were large (over a meg), and even as PNG files (a lossless compression) they were typically over 300k. I had thought about saving them as BLOB files in the database - that seemed like a simple solution. After discussing it with our IT consultant (who later became our IT manager), I decided to save the images on the local network and record the image location in the DB. A year later the test system was complete, and saving images worked very well. The images were often used to do post-mortem analyses on bad wafer lots or mask issues.

Well, this issue has come up again at my new job. In this case it's not just image files, but large dataset files as well. Again, we've decided to save them as separate files & just record the information. This time I actually have justification for the decision: a paper from Microsoft Research ("To BLOB or Not To BLOB" - cute) states that, assuming you're using MS SQL Server 2005, any data set larger than 256KB should be saved as a file instead of in the DB. In more detail, they state that there is actually a gray area between 256KB and 1MB, but you get the idea.

In summary, sometimes it's useful to know something about other disciplines. Especially if you are a test engineer.

No comments: