gwern comments on Open Thread for February 3 - 10 - Less Wrong

6 Post author: NancyLebovitz 03 February 2014 03:30PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (331)

You are viewing a single comment's thread. Show more comments above.

Comment author: btrettel 04 February 2014 10:24:58PM *  7 points [-]

How do you organize your computer files? How do you maintain organization of your computer files? Anyone have any tips or best practices for computer file organization?

I've recently started formalizing my computer file organization. For years my computer file organization would have been best described as ad-hoc and short-sighted. Even now, after trying to clean up the mess, when I look at some directories from 5 or more years ago I have a very hard time telling what separates two different versions of the same directory. I rarely left README like files explaining what's what, mostly because I didn't think about it.

Here are a few things I've learned:

  • Decide on a reasonable directory structure and iterate towards a better one. I can't anticipate how my needs would be better served by a different structure in the future, so I don't try that hard to. I can create new directories and move things around as needed. My current home directory is roughly structured into the following directories: backups, classes, logs, misc (financial info, etc.), music, notes, projects (old projects the preceded my use of version control), reference, svn, temp (files awaiting organization, mostly because I couldn't immediately think of an appropriate place for them), utils (local executable utilities).
  • Symbolic links are necessary when you think a file might fit well in two places in a hierarchy. I don't care too much about making a consistent rule about where to put the actual file.
  • Version control allows you to synchronize files across different computers, share them with others, track changes, roll back to older versions (where you can know what changed based on what you wrote in the log), and encourages good habits (e.g., documenting changes in each revision). I use version control for most of my current projects, even those that do not involve programming (e.g., my notes repository is about 700 text files). I don't think which version control system you use is that important, though some (e.g., cvs) are worse than others. I use Subversion because it's simple.
  • I store papers, books, and other writings that I keep in a directory named reference. I try to keep a consistent file naming scheme: AuthorYearJournalAbbreviation.pdf. I have a text file that lists my own journal abbreviation conventions. If the file is not from a journal, I'll use something like "chapter" or "book" as appropriate. (Other people use softwares like Zotero or Mendeley for this purpose. I have Zotero, but mostly use it for citation management because I find it to be inconvenient to use.)
  • In terms of naming files, I try to think about how I'd find the file in the future and try to make it obvious if I navigate to the file or search for it. For PDFs, you often can't search the text, so perhaps my file naming convention should include the paper title to help with searching.
  • README files explaining things in a directory are often very helpful, especially after returning to a project after several years. Try to anticipate what you might not remember about a project several years disconnected from it.
  • Synchronizing files across different computers seems to encourage me to make sure the directory structure makes at least some sense. My main motivation in cleaning things up was to make synchronizing files easier. I use rsync; another popular option is Dropbox.

Using scripts to help maintain your files is enormously helpful. My goals are to have descriptive file names, to have correct permissions (important for security; I've found that files that touched a Windows system often have completely wrong permissions), to minimize disk space used, and to interact well with other computers. I have a script that I titled "flint" (file system lint) that does the following and more:

  • checks for duplicate files, sorting them by file size (fdupes doesn't do that; my script is pretty crude and not yet worth sharing)
  • scans for Windows viruses
  • checks for files with bad permissions (777, can't be written to, can't be read, executable when it shouldn't be, etc.)
  • deletes unneeded files, mostly from other filesystems (.DS_Store, Thumbs.db, Desktop.ini, .bak and .asv files where the original exists, core dumps, etc.)
  • checks for nondescriptive file names (e.g., New Folder, untitled, etc.)
  • checks for broken symbolic links
  • lists the largest files on my computer
  • lists the most common filenames on my computer
  • lists empty directories and empty files

I'd be very interested in any other tips, as I often find my computer file organization to be a bottleneck in my productivity.

Comment author: gwern 04 February 2014 11:03:15PM 2 points [-]

checks for duplicate files, sorting them by file size (fdupes doesn't do that; my script is pretty crude and not yet worth sharing)

How can identical files be sorted by file size?

Comment author: btrettel 04 February 2014 11:08:12PM 0 points [-]

My wording was unclear. I sort the list of duplicate files by file size, e.g., the list might be like 17159: file1, file2; 958: file3, file4. This is useful because I have a huge number of small duplicate files and I don't mind them too much.

Comment author: gwern 05 February 2014 01:36:23AM 1 point [-]

Ah. Well, you're right that it's not easy to do that... Might want to subscribe to the bug report so you know if anyone comes up with anything useful: http://code.google.com/p/fdupes/issues/detail?id=3