So, the last few weeks, in my spare time between installing a new WiFi system and reading up on various programming techniques and best practices, I have been re-ripping all my hundreds CDs into lossless because I finally have the space on my laptop. I am not sure how many CDs and LPs I have, but I would estimate between 600 and 800 CDs alone. This means that the upper limit of my CD collection would be 560GB uncompressed (assuming 800 CDs completely filled with 700MB of music each). I think the actual number will only be around 200GB though after accounting for compression which usually squeeze things into a third of the space, if you add in about ~60GB worth of Vinyl digitized. But I am not going to write about the numbers, since those are just enjoyable for statisticians and math geniuses and algorithmic fun. I might post the final stats for fun, and do something with the data.
No, instead I want to talk about the meta data my computer has been pulling down from Gracenote’s CDDB. Initially, when CDDB came to life I had to submit the CD names and track info to it more often than now. But I took the time, knowing it would help someone else out later. But as time went on more people used the service and less cared about the quality of submissions. So, I noticed I had to correct more errors.
If one is only ripping one or two CDs at a time, corrections are fast and simple. But when one is literally ripping hundreds of CDs these small errors add up to a lot of time and frustration—especially when I noticed the old tags on my MP3s and the AAC files were conflicting with CDDB’s tags, and my original tags were more accurate than the new tags CDDB suggested. So, I have continually had to correct quite a few things in order to have iTunes smoothly replace the lossy files while keeping the good metadata, including my star ratings and hard to find album covers. So, I now present to you my list of grievances with CDDB and people who submit info to it.
(1) Use the fields for what they are meant to hold. Do not add in comments that do not appear on the track listing for a song. If a track says, “Cyclone (Alternative Crusher on Fire Mix)” do not enter “Cyclone Alt. Crusher On Fire Remix” or change it in any way. The only exceptions are below, and if an artist has a nasty habit of labeling all remixes the same name. In that case list the mix name if possible.
(2) If an album is a compilation of many artists, use the “Artist” field for the recording artist’s name, The Album Artist should be where “Various Artists” goes. And check the “Compilation” box to flag it as a compilation for the computer to sort it properly.
(3)In the case of collaborations such as “X featuring Y”, use whatever the credit reads on the album. What is worse is when someone enters the artist name, some arbitrary (and often incorrect) punctuation and the song title in Name field.
(*4) Title case should be used when artists use all upper case, lower case or camel case for their name or song title. I realize some artists stylize the casing, but it is a pain to syncronize, and can be difficult to remember. (“KAltE fArbEn” looks cool, but I do not think the artist actually types that each time he writes about his band.) Also the prepositions that are not essential to the object in question can be lowercased as they are in other titles. For instance “A Song From The South” should probably be “A Song from the South.” However if a preposition is part of a band’s name, the preposition should be capitalized: “The Cure,” “The Smiths,” “The Band,” “The The”—if you lower case any part of the last example, then you look especially silly.
(5) Do not abbreviate if the text on the album does not. In general, always copy the text verbatim on the album sleeve exactly, except in the case of case issues (#4 above).
(6) In multi-disc/LP sets use the “disc x of x” fields. Do not enter which disc number of a collection it is in the Album Title field. Computers will consider the discs unrelated otherwise, so certain sorts will not work correctly if the album title is polluted with things that are not the album’s title. Also, it is more difficult to tell how many individual works one has. Also, comments an non-normalized entries leads to the next issue.
(7) Since there is no standard for punctuation on additional info entered in the wrong place such as a song title listed as “Escape (J.B.Mix)” but entered as “Escape (remixed by J. Bayner)” because you read the liner notes makes it is difficult to synchronize punctuation. Also, If one person enters “Escape (…)” and another uses “Escape […]” the tracks will seem different to a computer’s comparison algorithm, unless both files are processed by an audio-print identifier {called digital audio “fingerprinting” by most}, which is time consuming and also susceptible to error. So, when having to add distinguishing info to the track name, follow the standard set by the artist or album. In most cases if I make an addition, I use brackets “[]”, and use parenthesis “()” when the comment in listed as part of the title. But that’s just my recommendation. Some cultures do the opposite.
(8) Track numbers go in the Track Number field, and Total Tracks is significant in case of confusion as to which version of the album someone has is, so use it. Often artists are released through different record companies in different countries because of licensing, but often the CDs make it into another country. Also, re-issues from the exact same company sometimes include bonus tracks or reorganize the album.
(9) Use the appropriate characters that the artist uses. It irks me when I see “Leæther Strip” mispelled “Leather Strip” or “Leaether Strip.” UTF-8 and the keyboard character map, or better yet: PopChar, are your friend.
(10) Double check your spelling before submission. If you see what looks like a typo on the album, check online to see if they meant to mis-spell something before “correcting” it or else you might actually be screwing it up.
(11) If you can’t be bothered to be accurate, do not even bother submitting track info because you’re not helping anyone. In fact, you are adding confusion and making more work for those of us that care, and need things to be accurate.
(12) This is to Gracenote: Hey, there are more than just 15 or 20 major genres. Try opening up your submission process and allowing custom genre tags and when you see a major trend of a musical style that doesn’t fit completely within another genre’s stylistic cues, break it out. Industrial is neither “Electronica,” nor “Rock,” nor “Alternative.” In fact “Alternative” should probably be divided up into “Post-Punk”, “Grunge” and the other 5+ sub-genres things fall into but do not completely fit. I seem to have to use the “Other” tag too much. That’s a sign of a bad schema, and equal to a “Miscellaneous” or “Notes” field in a database.
Seriously CDDB, analyze your edge cases to help define structure and expand your range (That’s data modeling 101), because it is always the exceptions to the rule that break the schema design and add tons of work to use if not fixed. Never equate a simply designed system with ease-of-use. It is actually well thought-out structure that is easy to grasp quickly, and robust features (backed up but solid design) designed around the end user that make things easy to use.
As a wise person once said, “Every genre is an alternative to another genre.” I think it was a particularly uncreative person that came up with “Alternative” as a genre in the first place. Depending on the decade that was applied to either Post-Modern (New Wave’s aftermath) or Grunge equally.
I have gotten into many discussions over the years about musical genres. I think that a tree could be made of all the genres and then we would have to draw the branches dividing, recombining and then shooting off in different directions in some cases. Meanwhile some bands would have to have dashed lines indicating heavy genre influences.
Since it impossible to catalog all the species of music, why not just allow for multiple genre tags. Darkwave can be either New Wave or Industrial, and Coldwave, while an offshoot of Industrial, also draws heavily on other styles. I have tons of tags because of this. I also have multi-genre tags: Electro-Industial, EBM/Industrial, EBM/Ambient, etc. While we are at it, some albums have different styles by the same artist. Why does the CD hold the genre tag? Oh, and “Soundtrack?” Really?
Discogs is putting Gracenote/CDDB to shame, even though they are incredibly anal. Discogs makes me look like a hippy in terms of meta data standards.