|
Post by AnthroHeart on Apr 2, 2024 13:53:24 GMT
I created a UTF-8 to WAV Converter, so you can transform your emojis and foreign characters into a WAV file.
Here's what one document sounds like with Chinese lettering:
|
|
|
Post by AnthroHeart on Apr 2, 2024 15:19:26 GMT
The highest printable UTF-8 character on my computer is:
It might not print correctly here.
I tested it alongside some normal text, and it didn't seem to throw off the values that much.
But I'll see if I can work with 32-bit UTF-8 characters.
|
|
|
Post by AnthroHeart on Apr 2, 2024 15:27:47 GMT
The highest reasonable UTF-8 character is: 😂, U+1F602, which is well supported across platforms. It did add peaks to my otherwise normal UTF-8 text.
When I used a chinese document, it didn't seem to affect the WAV that much.
|
|
|
Post by reden on Apr 2, 2024 15:29:00 GMT
The highest printable UTF-8 character on my computer is: It might not print correctly here. I tested it alongside some normal text, and it didn't seem to throw off the values that much. But I'll see if I can work with 32-bit UTF-8 characters. That is Undefined Character (U+10FFFF), part of PUA (Private Use Area, F0000-10FFFF), a place where any company or group may define any character to mean whatever they wish. For example, Apple has the apple logo you can write from a Mac or iphone there.
|
|
|
Post by AnthroHeart on Apr 2, 2024 15:40:12 GMT
Should I define the highest code my Unicode to WAV can read to be 😂, U+1F602 (The most practical). or: (U+10FFFF)?
The latter might require using 32-bit values.
|
|
|
Post by reden on Apr 2, 2024 16:07:56 GMT
Should I define the highest code my Unicode to WAV can read to be 😂, U+1F602 (The most practical). or: (U+10FFFF)?
The latter might require using 32-bit values.
The final meaningful printable character, before arriving to the Supplementary and Tertiary Ideographical (Chinese, Japanese, Korean) characters, which are tens of thousands of those, is 🯹 U+1F602. (which looks like a calculator font 9) 🯹 belongs to Symbols for Legacy Computing and is farther away from 😂 as seen in en.wikibooks.org/wiki/Unicode/Character_reference/1F000-1FFFF .
|
|
|
Post by AnthroHeart on Apr 2, 2024 16:09:39 GMT
Should I define the highest code my Unicode to WAV can read to be 😂, U+1F602 (The most practical). or: (U+10FFFF)?
The latter might require using 32-bit values.
The final meaningful printable character, before arriving to the Supplementary and Tertiary Ideographical (Chinese, Japanese, Korean) characters, which are tens of thousands of those, is 🯹 U+1F602. (which looks like a calculator font 9) 🯹 belongs to Symbols for Legacy Computing and is farther away from 😂 as seen in en.wikibooks.org/wiki/Unicode/Character_reference/1F000-1FFFF . I think I want to allow Chinese and foreign characters.
|
|
|
Post by reden on Apr 2, 2024 16:23:42 GMT
The final meaningful printable character, before arriving to the Supplementary and Tertiary Ideographical (Chinese, Japanese, Korean) characters, which are tens of thousands of those, is 🯹 U+1F602. (which looks like a calculator font 9) 🯹 belongs to Symbols for Legacy Computing and is farther away from 😂 as seen in en.wikibooks.org/wiki/Unicode/Character_reference/1F000-1FFFF . I think I want to allow Chinese and foreign characters. By tens of thousands of them, I meant even more than there already are in the Common Planes. There are several Codepages fully dedicated to recording thousands of Chinese and Korean characters in the Basic Multilingual Plane, the most common and the first in the list.
|
|
|
Post by AnthroHeart on Apr 2, 2024 16:31:32 GMT
I created a UTF-8 to WAV Converter, so you can transform your emojis and foreign characters into a WAV file.
I updated to allow for chinese and foreign langauge, and also allow for emojis.
GPT4 tells me this is the highest emoji value: 🧿U+1F9FF
I tested it with "🧿I am Love." and it did produce pulses but they weren't that bad. I used 0% smoothing. I'd rather not have to use logarithmic interpolation or anything. The audio isn't that bad.
However if I use "🧿IIIIIIIIII" the audio goes flat.
|
|
|
Post by AnthroHeart on Apr 2, 2024 16:46:45 GMT
Ok, that makes sense that a bunch of the same letters "IIII" will produce silence. When I did "🧿ababababababab" it did quiet down the audio to less than 100% max because of the outlier. I may have to do statistical analysis. Or consult with Claude when I am able to access it again because of usage limits. That one reduces the volume of the WAV by -4dB approximately. It isn't bad, but it's not 100%.
|
|
|
Post by reden on Apr 2, 2024 17:42:48 GMT
Ok, that makes sense that a bunch of the same letters "IIII" will produce silence. When I did "🧿ababababababab" it did quiet down the audio to less than 100% max because of the outlier. I may have to do statistical analysis. Or consult with Claude when I am able to access it again because of usage limits. That one reduces the volume of the WAV by -4dB approximately. It isn't bad, but it's not 100%.
-4dB is rather little, it doesn't really matter. I've heard that for speakers, hardware speaking, it's important to keep the volume at 95-99% instead of 100 as 100 could (unlikely, rarely) blow the speaker out.
|
|
|
Post by reden on Apr 2, 2024 17:48:54 GMT
I created a UTF-8 to WAV Converter, so you can transform your emojis and foreign characters into a WAV file.
I updated to allow for chinese and foreign langauge, and also allow for emojis.
GPT4 tells me this is the highest emoji value: 🧿U+1F9FF
I tested it with "🧿I am Love." and it did produce pulses but they weren't that bad. I used 0% smoothing. I'd rather not have to use logarithmic interpolation or anything. The audio isn't that bad.
However if I use "🧿IIIIIIIIII" the audio goes flat.
The highest emoji is 🫸 1FAF8, Rightwards Pushing Hand
|
|
|
Post by AnthroHeart on Apr 2, 2024 18:11:36 GMT
I tried to take out outliers, but it would reduce the audio volume when using smoothing. So I gave up. If your input has a bunch of outliers (unicode characters of way higher value than the rest of the text, it will reduce the overall volume. But it's not bad. It's like -4dB or something.
|
|
|
Post by nathanmyersc on Apr 2, 2024 23:36:19 GMT
I tried to take out outliers, but it would reduce the audio volume when using smoothing. So I gave up. If your input has a bunch of outliers (unicode characters of way higher value than the rest of the text, it will reduce the overall volume. But it's not bad. It's like -4dB or something. Hmm somehow collect outliers after youve valued all the characters and convert them to unique values within the range you like like compare all the highest valued you have. first get all the character values then create an set which is unique values. then find the highest value ones and see how far they are from the average. if they are really far then clamp them to unique values within the range you like. i still cannot get the audios made by your exe file to work for the image writer not sure about the unicode writer atm. hope it works id like to make some sanskrit affrmatons.
|
|
|
Post by AnthroHeart on Apr 2, 2024 23:40:13 GMT
I tried to take out outliers, but it would reduce the audio volume when using smoothing. So I gave up. If your input has a bunch of outliers (unicode characters of way higher value than the rest of the text, it will reduce the overall volume. But it's not bad. It's like -4dB or something. Hmm somehow collect outliers after youve valued all the characters and convert them to unique values within the range you like like compare all the highest valued you have. first get all the character values then create an set which is unique values. then find the highest value ones and see how far they are from the average. if they are really far then clamp them to unique values within the range you like. i still cannot get the audios made by your exe file to work for the image writer not sure about the unicode writer atm. hope it works id like to make some sanskrit affrmatons. Yes, I tried 1.5 standard deviations from the mean of all the text character values. It did take out the outliers, but it also reduced the amplitude when I used smoothing. That method could also potentially take out characters that aren't too far off from the mean, so I dropped it.
|
|