tag:blogger.com,1999:blog-13638090.post7867130439626954436..comments2023-10-28T03:20:59.400-06:00Comments on What Silence: Note to Shelf: Music, Language, ZippingJeremy Ricehttp://www.blogger.com/profile/12256074521855601742noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-13638090.post-5924182448083322262008-09-11T12:39:00.000-06:002008-09-11T12:39:00.000-06:00[nod] On later introspection, I realized the flaw-...[nod] On later introspection, I realized the flaw--at least for music.<BR/><BR/>It's somewhat misleading to say language recognition would be done by letter frequency: it would be more accurate to say "cluster frequency", since the ZIP algorithm is essentially finding patterns of adjacent terms and reducing them to single dictionary entries.<BR/><BR/>So where this would break down with sound is in the fact that the only information stored sequentially is the highest frequency sounds: and these are the most variable and least characteristic of the spectrum. ZIP dictionaries wouldn't tell you shit about the file.<BR/><BR/>If the method for storage of musical scores weren't so asinine*, I would imagine ZIPing scores would tell you something about the music, since certain clusters of musical information are characteristic of genre, if not composer.<BR/><BR/><BR/>...However, I'm not so sure about authorship. The use of closed-class items (shorter words) is one of the recognizable traits for a writer (as are spelling mistakes), and those are the most likely to be identified as reoccurring clusters... so I think there may be some merit to that line of thought.<BR/><BR/>Probably not much, though. : )<BR/><BR/><BR/>* This, based on my knowledge of the MIDI file format only.Jeremy Ricehttps://www.blogger.com/profile/12256074521855601742noreply@blogger.comtag:blogger.com,1999:blog-13638090.post-19492573583174996062008-09-11T12:10:00.000-06:002008-09-11T12:10:00.000-06:00I doubt it. it probably works for ZIP archives by ...I doubt it. it probably works for ZIP archives by means of letter frequency analysis -- each language has a default letter frequency signature, I imagine. I don't see how that could be expanded t the characteristics you mentioned.Victorhttps://www.blogger.com/profile/17964804781965361460noreply@blogger.com