I’ve noticed some files I opened in a text editor have all kinds of crazy unrenderable chars
You’re looking for https://en.m.wikipedia.org/wiki/Character_encoding Which explains the funny characters.
Spank you, much :D
Spank?
Yes, see Binary-to-text encoding (e.g., Base64).
Can you comment on the specific makeup of a “rendered” audio file in plaintext, how is the computer representing every little noise bit of sound at any given point, the polyphony etc?
What are the conventions of such representation? How can a spectrogram tell pitches are where they are, how is the computer representing that?
Is it the same to view plaintext as analysing it with a hex-viewer?
There’s two things at play here.
MP3 (or WAV, OGG, FLAC etc.) provide a way to encode polyphony and stereo and such into a sequence of bytes.
And then separately, there’s Unicode (or ASCII) for encoding letters into bytes. These are just big tables which say e.g.:
01000001
= uppercase ‘A’01000010
= uppercase ‘B’01100001
= lowercase ‘A’
So, what your text editor does, is that it looks at the sequence of bytes that MP3 encoded and then it just looks into its table and somewhat erronously interprets it as individual letters.
Most binary-to-text encodings don’t attempt to make the text human-readable—they’re just intended to transmit the data over a text-only medium to a recipient who will decode it back to the original binary format.
I do understand I’m not able to read it myself, I’m more curious about the architecture of how that data is represented and stored and conceptually how such representation is practically organized/reified…
The original binary format is split into six-bit chunks (e.g., 100101), which in decimal format correspond to the integers from 0 to 63. These are just mapped to letters in order:
- 000000 = A,
- 000001 = B,
- 000010 = C,
etc.—it goes through the capital letters first, then lower-case letters, then digits, then “+” and “/”. It’s so simple you could do it by hand from the above description, if you were looking at the data in binary format.
One representation of a sound wave is a sequence of amplitudes, expressed as binary values. Each sequential chunk of N bits is a number, and that number represents the amplitude of the sound signal at a moment in time. Those moments in time are spaced at equal intervals. One common sampling rate is 44.1 kHz.
That number is chosen because of the nyquist-shannon sampling rate theorem, in combination with the fact that humans tend to be able to hear sounds up to 20 kHz.
The sampling rate theorem says that if you want to reproduce a signal containing information at up to X frequency, you need to sample it at 2X frequency.
To learn more about this topic, look for texts, classes, or videos on “signal processing”. It’s often taught in classes that also cover electronic circuits.
Here is an example of such a text
That’s pretty dense reading, but if you’re willing to stop and learn any math you encounter while reading it, it will probably blow your mind into a whole new level of understanding the world.
I honestly wish I had gotten into all the science and physics of signal processing, taken calculus etc, I feel like I’ll pick up a lot of the more qualitative stuff over time particularly if I’m able to apply it in building certain apps that do some novel manipulations and obviously some of that will require me to get an operational understanding of how to put all these blocks together.