Why Windows audio is horrible and how to improve it. (Guide)

william123098 · January 7, 2014, 9:01pm

I don't typically use Windows myself and aside from what I do in school, I have used it maybe four times in the past year. Personally, I was able to hear a large difference is loss less audio files and very little in lossy ones. (still an improvement). WASAPI and ASIO are by no means a perfect solution but they do serve to right many of the Windows wrongs; of course the ultimate solution is to do things properly and Linux or mac OSX. You can still use other plug ins while using both ASIO and WASAPI but I am not sure if there is any of patching up compression on lossy audio files - and I have looked a fair bit.

Rugaliz · January 7, 2014, 9:01pm

okay on second inspection i notice a difference..primarily with the plugin the audio is slightly louder (witch is good for old music) and maybe the sound is somewhat clearer..but it's also difficult to differentiate..i'll be doing more tests in the future..

I used the songs "whole lotta love" and "catch the rainbow"..now witch bit rate should i choose? 16 bit default of my CDs? 95% of my music is flac so i don't worry to much about the source ;)

After more tests, i may declare the plugin worth the hassle

william123098 · January 7, 2014, 9:02pm

Also, I may very well add an audacious guide for windows at some point.

Rugaliz · January 7, 2014, 9:07pm

I have a question, before i tried the plugin, i had my foobar output set to my speakers, i read online somewhere that this bypassed the windows audio kernel, and i also noticed the difference between the plugin and the other bypass...so witch do you guys think it's better?

william123098 · January 7, 2014, 9:16pm

They both serve the same function and do essentially the same thing so it would be very subjective I think just see which one you think is better. Personally, I had a slight preference to WASAPI.

william123098 · January 7, 2014, 10:37pm

Bumping this, people need to know.

Samuli · January 7, 2014, 11:45pm

Why windows audio is bad: It is due to the ungodly awful way in the Windows audio stack itself actually functions. It begins by up scaling all audio (meaning literally all audio played on the OS, regardless of source) passed through it to a 32-bit floating point sample depth. This would not be too bad apart from the fact that it is done in a really very messy manor by the Windows audio stack, which could potentially (and does) cause some samples to become slightly off of the true original value. This audio is then down scaled from the 32 bit-floating point sample depth into the highest possible number of bits per sample your audio hardware can utilize, usually 16 bits.

I looked at how matlab does converting integers to floating point and it is... drum roll "int / 0x8000". Considering the simplicity, and the fact that you can losslessly upconvert any bit depth integers just by adding 0's after the lsb's (least significant bits) to any arbitrarily large bit length, Microsoft must have seriously fucked something up to let this past their know-how.

For example, you play a piece of music with and there are two samples of audio playing in one part. One of these samples could be 96 and the other, 91. You are doing so with the audio mixer set to 10% of the original volume. This means that those two samples are now 9.6 and 9.1, respectively. Now, we imagine you are using audio hardware that allows for 16 bits. This means that the 9.6 will now become either 9 or 10 and and the 9.1 will become a 9. This will cause a large impact on the audio quality.

This is an argument FOR using 32 bit floating point. In a mixer that works with 16-bit integers there isn't any space say between a 10 and a 11, but in floating point, because of the way it is defined in IEEE standard, there are smaller steps between smaller numbers and bigger gaps between the bigger ones. In 32 bit floating point, the step to next possible value is about 10 millionth of the number's value. In a simple volume adjustment (scaling factor) this is not remarkable, you still get about the same noise level in the output as if the bit depth was kept at 16 bits through the mixer - It's just a fact of life that this how it works - But the up and down conversion doesn't hurt, especially if you can output to 24 bit instead where a lot more precision is maintained (to the point where the noise floor is likely lower than that of rest of the equipment). Where the 32 bit floating point really shines, though, is if more demanding signal processing is needed, say an EQ for example. There you don't have to stand the raised noise floor of 16-bit rounding errors accumulating for every operation done on the signal.

Granted, windows mixer has at least one fault. It doesn't pass bit perfect signal, but dithering is done on the down-conversion back to 16-bit integer regardless if any DSP was applied (and yes, volume control does count as signal processing). This means that if you use any volume control on your pc, bypassing the mixer seems pointless. But I'll wait on my judgment in the case an actual technical explanation of the wrong-doings of windows sound stack emerges.

william123098 · January 8, 2014, 12:00am

Finally, a proper rebuttal,

I looked at how matlab does converting integers to floating point and it is... drum roll "int / 0x8000". Considering the simplicity, and the fact that you can losslessly upconvert any bit depth integers just by adding 0's after the lsb's (least significant bits) to any arbitrarily large bit length, Microsoft must have seriously fucked something up to let this past their know-how.

You are mostly correct here. You underestimate the stupidity of Microsoft though.

As for your second point; you cold not be more wrong. The audio stack works in this way: upscale > downscale to hardware's requirement > filter down sound to adjust audio level.

It appears you wrongly made the assumption that that the Windows audio stack decreased the noise level while it's still using the 32 bit floating stack.

Zoltan · January 8, 2014, 1:33am

I've posted on this earlier today before this thread in the MP3 versus thread, and a couple of months ago on the forum, where I've already explained the phenomenon, and illustrated it with practical examples back in April or so already.

The problem isn't the resampling per se, even though the interpolation is faulty in upsampling, the problem is the capacity of the data stream, audio data is thrashed by the windows audio stack, probably because there is a system resources capping mechanism in place to avoid stutter or something, and the double requantization forced that to reduce the floating sampling frequency so much that the signal is heavily affected. The deterioration obviously happens in the fixed requantization/resampling internally in the audio stack, because the output format is determined by the audio hardware and the input format by the source. What the audio stack does exactly is not known, it's closed source, but incidents with Microsoft for Windows software running in linux, suggest that the DirectAudio API, that targets the Windows audio stack, isn't capable of filling a normal set of fixed buffers in time, and needs to be prebuffered enormously to avoid buffer underrun. That would cause the data hungry 32/floating requantization/resampling to be starved of the data necessary to keep up a high sampling rate, but as it has a fixed resolution of 32 bits, it keeps on filling segments with less data per sample segment, either heavily interpolating scarce data, or reducing the sampling rate enormously, but in all cases, heavily deteriorating signal quality. What exactly goes wrong, is not clear, the fact is that it goes wrong, and that in order to prevent buffer underrun of a standard 16/44.1 connector, the DirectAudio API targeted Microsoft applications need to be pre-buffered with 30 to 120 ms (60 on most systems) of latency. Now this is not a problem of PulseAudio, because that is auto-adjusting in terms of latency, and can even rebuffer as a whole to avoid buffer overrun for streams from really intense applications.

Microsoft was just too ambitious with this, and made several serious mistakes, and as I explained earlier in the other thread before most of my posts were somewhat reprised in this thread, Microsoft just never fixed it because they didn't gain anything in terms of media sales from the Windows platform, but switched focus to the XBox platform about a decade ago, and that is a platform where they do commercially benefit from a quality audio stack, en where they go through great lengths (using a hypervisor and virtualizing both XBox OS and a very thin Windows client) to adapt to the shortcomings of their Windows OS while keeping the XBox capable of a better media and games experience, without having to change anything on the Windows OS, so that they can keep up old software patents and license deals based upon them. They couldn't sell the XBone with an audio stack like the one in Windows, but they had to provide compatibility with DirectAudio API targeted to the windows audio stack, and at the same time, they had to provide a better audio stack with an updated API for the XBox OS, for which a better audio experience is paramount.

thanos · January 8, 2014, 8:37am

By the way would this still work with a USB driverless DAC?

william123098 · January 8, 2014, 4:00pm

Yes however ASIO4ALL may not. WASAPI most probably will.

thanos · January 8, 2014, 5:39pm

well uh maybe/most likely wasn't really the answer i was hoping for. is there anyone here who could test this out?

Zoltan · January 8, 2014, 9:09pm

ASIO4ALL most certainly will work, because it's made specifically for standard compliant audio interfaces. If your USB is driverless, that means that it's standard compliant, so ASIO4ALL will work.

DiParagon · January 9, 2014, 6:03am

I need to find the link, but the guy who makes foobar2k stated there is no difference at all between useing directsound and wasapi or asio.

raimeken · January 9, 2014, 6:26am

Personally I agree with the plugin descriptions for both WASAPI and ASIO, they make zero difference and in fact make my life miserable whenever I want to play some music quickly during some downtime only to see that foobar has muted everything else. Sometimes I do things like these to have a warm and fuzzy feeling inside but its just a hassle using WASAPI. And please, if I have to switch to Directsound whenever I have to do something other than passively listen to music that is way too inconvenient.

thanos · January 9, 2014, 7:40am

well then whats the point?

Zoltan · January 9, 2014, 7:50am

I was afraid of that, I think I had registered that the ASIO function in Foobar is just a compatibility layer, and doesn't solve the problem. That's why I posted not to use Foobar in the other thread, but I'm not a Windows specialist, I can't tell for sure because I don't have Windows on any of my PC's. I was convinced - as I stated in the MP3 v FLAC thread before this thread came about - that you could only solve the problem by using a media application targeted for ASIO, like a DAW, and an audio card with a native ASIO driver or a class compiant audio card with ASIO4ALL.

Somebody needs to check this though, it's important, because I can clearly hear the difference between MP3 and FLAC on even my laptop speakers of my work laptop (with linux).

thanos · January 9, 2014, 8:20am

Okay iv just tested it with and without, and admittedly there is an audible difference between using waspi and not, with waspi the music was louder (not really a pro or con to be honest) but the sound was clearer low's mids and high were more distinguishable and not so muffled. so that's something and frankly put i was skeptical or whether it even did anything at all so my bias actually sat against the claimed benefits (even though they were not specified, just that it sounds "better").

but still it would be interesting to see if anyone can dig up info on how it all works.

NeOZeN · January 9, 2014, 9:02am

Cheers for this Bill, giving it a go now.

Samuli · January 9, 2014, 10:23am

Okay iv just tested it with and without, and admittedly there is an audible difference between using waspi and not, with waspi the music was louder (not really a pro or con to be honest) but the sound was clearer low's mids and high were more distinguishable and not so muffled.

"Louder is better" is one of the big truths in psycho-acoustics, so that is a good candidate for the difference you heard. Not to say there couldn't be any perceptible difference, since the windows mixer does do dithering on the signal even if the volume control is left untouched, but only real test is doing it blind with normalized volume (using analog volume control of course).

The upper only applies if you're outputting 16 bit. for 24 bit, the added noise from dithering is at such low level, that it should be completely inaudible. In fact, the noise already in the recording from the microphones in the studio is at a higher level and DAC's, audio op-amps, noise picked by the interconnects and traces on the PCB and so on, are almost invariably more detrimental to the sound. The last few bits are buried in noise, so dithering just doesn't matter.