Storing files/images/binary data into the database involves lots of performance issues. The more widely recommend solution for such a scenario is to push the files onto disk and store the metadata about the file and its location into a database. This has its pros and cons. If you have such a need and dont have the time to build a custom solution you can try MogileFS/Compete filesystem/Bit Mountain. If you are from perl background you can use MogileFS. For python you can try CFS/BM.

MogileFs usage flow is as follows :

  • app requests to open a file (does RPC via library to a tracker, finding whichever one is up). does a “create_open” request.
  • tracker makes some load balancing decisions about where it could go, and gives app a few possible locations
  • app writes to one of the locations (if it fails writing to one midway, it can retry and write elsewhere).
  • app (client) tells tracker where it wrote to in the “create_close” API.
  • tracker then links that name into the domain’s namespace (via the database)
  • tracker, in the background, starts replicating that file around until it’s in compliance with that file class’s replication policy
  • later, app issues a “get_paths” request for that domain+key (key == “filename”), and tracker replies (after consulting database/memcache/etc), all the URLs that the file is available at, weighted based on I/O utilization at each location.
  • app then tries the URLs in order. (although the tracker’s continually monitoring all hosts/devices, so won’t return dead stuff, and by default will double-check the existence of the 1st item in the returned list, unless you ask it not to…)

In the above flow, one quirky thing is once the file is pushed on to the disk then the app tells tracker where it stored the file. Since storing of the file and metadata info is not atomic, there is high possibility of syncing issues. Will explore further to see if they have some internal mechanism to handle this quirk.