McGlaspiewww.team156.comJoin Date: 2010-07-26Member: 73044Members, Super Administrators, Forum Admins, NS2 Developer, NS2 Playtester, Squad Five Blue, Squad Five Silver, Squad Five Gold, Reinforced - Onos, WC 2013 - Gold, Subnautica Playtester
edited November 2017
@Nordic pinged me to respond in here to clear up some of the technical mumbo-jumbo...
So, hopefully this either helps the discussion or clears up some of the technicalities on this topic.
Spark Engine uses FMOD v4.44.58 which by software standards is a very out of date version. Spark also does _not_ initialize any sound hardware for its sound system. There have been a large number of technical complications over the years of getting the correct Sound Device ID feed into FMOD (among other things). And quite some time ago, it was deemed safer and easier to utilize a pure CPU-based sound system. While Spark does send commands/signals to a system's audio device(s), it does not initialize itself in a direct/exclusive binding to them. As a result, all the sound channels are managed by the CPU, and not bound (or unbound for that matter) by hardware capabilities. One of the bigger issues arose from fetching the correct Device GUID (Audio Device ID) across all systems when Linux support was added. This is further compounded by trying to normalize Linux + Windows when in relation to hardware device ids (not "fun").
"So...why not just upgrade, right?"
To put it simply...that would be a metric-crapton of work, which isn't purely programmer-time. Moving to the "modern" version of FMOD would require the all of the Sound Banks be rebuilt. This means every-single-sound-event would need to be built from scratch (which there are literally 100s of). And as far as I know, there isn't a simple "Import" like utility for newer versions that is both "safe" and reliable. Also, FMOD licensing has changed over the years, and if my memory serves correctly would require Unknown Worlds to fork over a not unsubstantial fee to FMOD. Combine that with the unknowns of what _new_ potential issues would come from the upgrade...and the amount of man-hours to make it happens pushes the proposal into unattainable territory in addition to the debugging / testing / fixing man-hours. All of the points I just illustrated would be even _moreso_ if we jumped to a different Sound Engine (i.e. Steam Audio or Wwise ...not to mention additional potential monetary costs).
"What's up with the Hive screams sounding like an inept DJ is trying to go all '1990s remix' on them?"
All of the sounds in the game are associated with an FMOD Event (think of this as an ID in FMOD's sound-database). Each event not only specifies what sounds to play, but also any DSP Filters (reverb, occlusion, Db falloff, etc.) applied to them. There is also a maximum number of those events that can play at one and also how important each of those individual events are. As a result (and we all know how many 'important' sound are in NS2), some of those priorities can either conflict, or just dominate the soundscape. Hive screams and Distress Beacons, and Power Node changes are obvious examples of "big and important" sounds. The combination of software-managed sound channels and high-priority sound events creates a perfect storm for conflicting playback of all the audio sources (the actual individual audio in a given Sound Event). This conceptual issue is something I want to revisit and potentially diminish when I can find the time (it would require reviewing 100s of sound events and tuning them as needed).
"Why do sounds travel so absurdly far when they shouldn't?"
This is a little complicated, but I'll take a swing at it. Almost all of the Sound Events in NS2 (keep in mind, managed and processed by FMOD) have a few additional layers applied to them. More specifically, Obscurance and Reverb. When Spark is loading a map, part of the loading process is to setup a 3D representation of the game-world to FMOD. This is done not by supplying it with 100% of the visual-data for a given level; instead, all of the Occlusion data for a given map is feed to FMOD, so it builds its own "version" of the 3D game world. This approximated version of the game world is what FMOD utilizes to determine how Obscured a given sound source is which is based on the position of the listener. As a result of how FMOD is configured, the weighted importance of obscurance, and the data provided on-load...some sounds travel further than they should. The classic example is Marine Start on Veil and the Nano-Grid location. This is due to the combination of Physical 3D distance (think tracing a straight line) compared to how much "obscuring geometry" is in between the sound source and a listener. The bulk of the 3d-data provided to FMOD is derived from a map's Occlusion Culling geometry. As a result, the greater sound-event to sound-listener proximity is, the more likely audio is to travel through walls when it seemingly shouldn't. This is regardless of Map-Distance (i.e. in-game volumetric distance).
Is this all A-Okay and "Perfect"? ...Nope, not imo...
Personally speaking, I very much appreciate both the immersive importance and contextual cues for gameplay. Good audio goes a very, very long way to both expressing gameplay mechanics and audibly painting the full picture of a fictional world. Along these lines, I've done a number of experiments this year in trying to find the appropriate balance and contextual audio cues that jives with NS2 in a balanced way. Unfortunately, all of my more ambitious experiments aren't practical to get pushed with an update. The reason being is they are not simply performant enough. I first tried performing true 3D volumetric calculations from sound-source to sound-receiver. While that worked in concept, it caused a very significant cpu perf cost. I tried utilizing the navigation mesh data (which all levels are required to have) to decrease that cost, and it was wholly insignificant in terms of cpu-time reduction. I've semi-recently experimented with rolling an interval-additive factor to sound obscurance to NS2. While this does defer most of the cpu-cost across multiple frames, it still causes far too much of a "CPU Expense" to be pushed with an update. I'm currently working on a new approach (on the side), that simplifies the problem but balances the need for "volumetric" sound influences vs cpu-costs. Hopefully, I can cram it into a (soon) future update.
Apologies if I didn't address something specific anyone had a particular concern or question about.
My intent with this post was to clear up some of the technicalities around NS2's sound events and propagation.
The TL;DR...
Audio playback in game engines is _not_ simple. Especially in a networked context.
FMOD v4.44.xx, while not ideal is what we've got, and what we're stuck with.
Yes, there is room for improvement with NS2's audio playback and obscurance.
We're continuing to look into ways to improve sound propagation in the 3d-space of NS2's maps.
You know it is actually really sad, the fact that proper sound hardware and support in games is totally unappreciated in our time and the last thing that is tacked onto most people's new rigs... Even in this thread specifically about improving audio, I'm getting this vibe of "meh sound is good enuf on my hardware" I'm not even going into people playing with onboard motherboard chips, they simply don't know, they don't know what they're missing
Regarding 3D audio with stereo output:
Reconstructed 3D audio doesn't work for everyone. I found myself trying some company's (forgot the name) 3D-from-headphones algorithms at a professional broadcasting convention ; my colleague said he was impressed with how accurate the positioning sounded to him, but I heard pretty much nothing but stereo.
The engineer who was presenting the demo explained that, to achieve 3D positioning, they studied how sounds bounce off the average person's face and ears into their eardrums by placing small microphones into a test subject's eardrums, and studying the delay between each ear, looking at dephasing effects and other things I don't remember / didn't understand. Then, to create 3D audio, they could replicate the transformations they measured and trick the brain into hearing the sound "coming from" a specific spot.
The problem is that everybody looks different, so they had to tweak their model for the average human head. So while it should work okay for most people, it will work perfectly for some and not that much for others.
In my case, my ears stick out a bit from my head, maybe that's why it doesn't work well for me ? Anyway, until I make my own model with tiny microphones in my ears, I'll never know what I'm missing :P
For me, sound is more important than graphics in a game. A game with great sounds and bad visuals will keep me entertained longer than a game with great visuals and bad sound.
I work with audio and video all the time, and yep, everyone at work seems to agree that a small visual glitch is acceptable, but the tiniest audio glitch is very distracting
@McGlaspie, that explains the awful 3D positioning of the sounds, but I do believe it has gotten worse with a NS2 version from about a year ago. It was the first thing that I noticed when I started up NS2 again a few weeks back. Your post also is quite a good insight as to why "sound" engines are vastly underestimated in terms of how much actual raw processing power it requires. I mean CPU's have become al lot faster, compared to 1998-2000, yet in software simulation it still requires quite a bit of CPU cycles, even today...
Thanks for explaining it btw, very interesting read!
moultanoCreator of ns_shiva.Join Date: 2002-12-14Member: 10806Members, NS1 Playtester, Contributor, Constellation, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Gold, NS2 Community Developer, Pistachionauts
edited November 2017
@McGlaspie It seems like this sort of thing could be cached, as the path through the nav mesh to any far away sound emitter changes very little from frame to frame. Things are either 1. in room and updating quickly. 2. Not in room, and reachable through one of a small number of exits that change slowly.
Roughly speaking, you'd trace to everything in the cache as "in room" (within n units on nav mesh or within LoS.) on every frame. 1/50th of everything not known to be in room (not in the "in room" cache") would be handled every 50th frame. Things would only move from "not in room" to "in room" when processed at this delayed rate. Sounds would be played using the current data whenever they occur, which for far away things could be stale by up to 50 frames.
@McGlaspie It seems like this sort of thing could be cached, as the path through the nav mesh to any far away sound emitter changes very little from frame to frame. Things are either 1. in room and updating quickly. 2. Not in room, and reachable through one of a small number of exits that change slowly.
Roughly speaking, you'd trace to everything in the cache as "in room" (within n units on nav mesh or within LoS.) on every frame. 1/50th of everything not known to be in room (not in the "in room" cache") would be handled every 50th frame. Things would only move from "not in room" to "in room" when processed at this delayed rate. Sounds would be played using the current data whenever they occur, which for far away things could be stale by up to 50 frames.
The problem with a cached and/or deferred update approach is the Db level of sounds would suddenly jump. So, the volume of a sound event would look something like (yay ascii "graph"):
_____
_____|
______|
___|
Which would be pretty jarring. Also, FMOD doesn't provide a straightforward method to dynamically (via API calls) change properties of an event at run-time. The entire library is designed so the sound event definitions themselves dictate that behavior, and not external code controlling FMOD. As a result, using cached data (external to FMOD) to augment the properties of an event isn't a viable option. And attempting to dynamically control Sound Events externally to FMOD has a huge performance cost (I suspect due to FMOD's internal caching and channel management routines). This issue is a big part of why my previous attempts at augmenting what we have were not performant.
@McGlaspie It seems like this sort of thing could be cached, as the path through the nav mesh to any far away sound emitter changes very little from frame to frame. Things are either 1. in room and updating quickly. 2. Not in room, and reachable through one of a small number of exits that change slowly.
Roughly speaking, you'd trace to everything in the cache as "in room" (within n units on nav mesh or within LoS.) on every frame. 1/50th of everything not known to be in room (not in the "in room" cache") would be handled every 50th frame. Things would only move from "not in room" to "in room" when processed at this delayed rate. Sounds would be played using the current data whenever they occur, which for far away things could be stale by up to 50 frames.
The problem with a cached and/or deferred update approach is the Db level of sounds would suddenly jump. So, the volume of a sound event would look something like (yay ascii "graph"):
_____
_____|
______|
___|
Which would be pretty jarring. Also, FMOD doesn't provide a straightforward method to dynamically (via API calls) change properties of an event at run-time. The entire library is designed so the sound event definitions themselves dictate that behavior, and not external code controlling FMOD. As a result, using cached data (external to FMOD) to augment the properties of an event isn't a viable option. And attempting to dynamically control Sound Events externally to FMOD has a huge performance cost (I suspect due to FMOD's internal caching and channel management routines). This issue is a big part of why my previous attempts at augmenting what we have were not performant.
I'm not sure what the granularity of sound events is, but what if you only use the new data for new events? More latency again, but no jumps in mid sound. How does this work currently? If you are getting closer to something mid-sound-event, does it actually get louder mid-sound-event?
(Also, hopefully the deferred sounds would be all far away enough that their exact parameters 1) change slowly. 2) don't matter much.)
NintendowsJoin Date: 2016-11-07Member: 223716Members, Squad Five Blue
I created a new version of ns2_tram with some experimental occlusion geometry that seems to improve the issues of hearing things from rooms far away. For example from north tunnels/platform to mezzanine, from shipping to south tunnels, and ore processing to warehouse to name a few.
It's a tedious job to add these "extra layers" of occlusion, but it shows that a mapper can use the current system and provide a little extra information to improve the occlusion problem. Perhaps in the future mappers could have the ability to set the "density" of a wall, or perhaps the engine could somehow measure the thickness of a wall to determine how much it muffles/occludes the sound.
@antou did you try the videos in my spoiler tag with headphones, especially the A3D deme?!
I just did:
HL: pretty much stereo to me
Quake: during the parts with A3D ON, I can hear some phasing effects, so something is happening... but I don't hear much difference in term of positionning
A3D demos: I thought this one worked for me ! Until I closed my eyes, and then without visual clues it was just stereo for my brain x)
Barber shop: this one I liked a lot ! Maybe it's because it's less abstract sounds, or because I made up a scene in my mind, but it seems to have worked best.
Also the guy in the HL and Quake videos is really, really bad
Thanks for sharing the videos.
Kouji_SanSr. Hινε UÏкεεÏεг - EUPT DeputyThe NetherlandsJoin Date: 2003-05-13Member: 16271Members, NS2 Playtester, Squad Five Blue
Well, kinda goes to show that it is a combination of visual and audio that has to force your brain to brain And yeah, he was indeed very bad. Really rage inducing lack of situational awareness.
So, to summarize all the posts, we have a game that ENORMOUSLY depends on visual and audio cues, but is essentially too broken to be fixed, like ever. Well, that's joyful... The reason why I made this thread was because despite superior hardware used to run the game, I'm experiencing sound skipping when a lot of action is happening. Literally the same as when you exceed number of sound buffers and the soundcard/game just starts truncating stuff to make space for new audio events. If it's pure CPU based, then my soundcard plays no role here. Which makes me question what's wrong with the audio queue and sound buffers. Are they too small? There is a queue problem where engine flushes events in wrong order to make space for new ones? What is it? Surely something within the engine allows you to fiddle at least a bit. As a user, I have ZERO control over it other than volume. Console also doesn't have any. But I'm certain that can't just be the end of it. The game can't be THAT primitive because I've seen more advanced audio in System Shock 2 from 1999 then...
But also secondly, if it's a game where sound is just a decorative thing, whatever. In a game where audio cues mean a difference between a dead marine and dead alien, it's something worth investing almost any kind of time into. Because broken audio in NS2 means broken gameplay as gameplay actively depends on it. And you don't want a broken gameplay in a game where gameplay is literally above anything else (I mean it's a team based shooter with another layer of a strategy with commander on top, it's a delicate and complex game system). This stuff matters and it matters A LOT.
Comments
So, hopefully this either helps the discussion or clears up some of the technicalities on this topic.
Spark Engine uses FMOD v4.44.58 which by software standards is a very out of date version. Spark also does _not_ initialize any sound hardware for its sound system. There have been a large number of technical complications over the years of getting the correct Sound Device ID feed into FMOD (among other things). And quite some time ago, it was deemed safer and easier to utilize a pure CPU-based sound system. While Spark does send commands/signals to a system's audio device(s), it does not initialize itself in a direct/exclusive binding to them. As a result, all the sound channels are managed by the CPU, and not bound (or unbound for that matter) by hardware capabilities. One of the bigger issues arose from fetching the correct Device GUID (Audio Device ID) across all systems when Linux support was added. This is further compounded by trying to normalize Linux + Windows when in relation to hardware device ids (not "fun").
"So...why not just upgrade, right?"
To put it simply...that would be a metric-crapton of work, which isn't purely programmer-time. Moving to the "modern" version of FMOD would require the all of the Sound Banks be rebuilt. This means every-single-sound-event would need to be built from scratch (which there are literally 100s of). And as far as I know, there isn't a simple "Import" like utility for newer versions that is both "safe" and reliable. Also, FMOD licensing has changed over the years, and if my memory serves correctly would require Unknown Worlds to fork over a not unsubstantial fee to FMOD. Combine that with the unknowns of what _new_ potential issues would come from the upgrade...and the amount of man-hours to make it happens pushes the proposal into unattainable territory in addition to the debugging / testing / fixing man-hours. All of the points I just illustrated would be even _moreso_ if we jumped to a different Sound Engine (i.e. Steam Audio or Wwise ...not to mention additional potential monetary costs).
"What's up with the Hive screams sounding like an inept DJ is trying to go all '1990s remix' on them?"
All of the sounds in the game are associated with an FMOD Event (think of this as an ID in FMOD's sound-database). Each event not only specifies what sounds to play, but also any DSP Filters (reverb, occlusion, Db falloff, etc.) applied to them. There is also a maximum number of those events that can play at one and also how important each of those individual events are. As a result (and we all know how many 'important' sound are in NS2), some of those priorities can either conflict, or just dominate the soundscape. Hive screams and Distress Beacons, and Power Node changes are obvious examples of "big and important" sounds. The combination of software-managed sound channels and high-priority sound events creates a perfect storm for conflicting playback of all the audio sources (the actual individual audio in a given Sound Event). This conceptual issue is something I want to revisit and potentially diminish when I can find the time (it would require reviewing 100s of sound events and tuning them as needed).
"Why do sounds travel so absurdly far when they shouldn't?"
This is a little complicated, but I'll take a swing at it. Almost all of the Sound Events in NS2 (keep in mind, managed and processed by FMOD) have a few additional layers applied to them. More specifically, Obscurance and Reverb. When Spark is loading a map, part of the loading process is to setup a 3D representation of the game-world to FMOD. This is done not by supplying it with 100% of the visual-data for a given level; instead, all of the Occlusion data for a given map is feed to FMOD, so it builds its own "version" of the 3D game world. This approximated version of the game world is what FMOD utilizes to determine how Obscured a given sound source is which is based on the position of the listener. As a result of how FMOD is configured, the weighted importance of obscurance, and the data provided on-load...some sounds travel further than they should. The classic example is Marine Start on Veil and the Nano-Grid location. This is due to the combination of Physical 3D distance (think tracing a straight line) compared to how much "obscuring geometry" is in between the sound source and a listener. The bulk of the 3d-data provided to FMOD is derived from a map's Occlusion Culling geometry. As a result, the greater sound-event to sound-listener proximity is, the more likely audio is to travel through walls when it seemingly shouldn't. This is regardless of Map-Distance (i.e. in-game volumetric distance).
Is this all A-Okay and "Perfect"? ...Nope, not imo...
Personally speaking, I very much appreciate both the immersive importance and contextual cues for gameplay. Good audio goes a very, very long way to both expressing gameplay mechanics and audibly painting the full picture of a fictional world. Along these lines, I've done a number of experiments this year in trying to find the appropriate balance and contextual audio cues that jives with NS2 in a balanced way. Unfortunately, all of my more ambitious experiments aren't practical to get pushed with an update. The reason being is they are not simply performant enough. I first tried performing true 3D volumetric calculations from sound-source to sound-receiver. While that worked in concept, it caused a very significant cpu perf cost. I tried utilizing the navigation mesh data (which all levels are required to have) to decrease that cost, and it was wholly insignificant in terms of cpu-time reduction. I've semi-recently experimented with rolling an interval-additive factor to sound obscurance to NS2. While this does defer most of the cpu-cost across multiple frames, it still causes far too much of a "CPU Expense" to be pushed with an update. I'm currently working on a new approach (on the side), that simplifies the problem but balances the need for "volumetric" sound influences vs cpu-costs. Hopefully, I can cram it into a (soon) future update.
Apologies if I didn't address something specific anyone had a particular concern or question about.
My intent with this post was to clear up some of the technicalities around NS2's sound events and propagation.
The TL;DR...
Audio playback in game engines is _not_ simple. Especially in a networked context.
FMOD v4.44.xx, while not ideal is what we've got, and what we're stuck with.
Yes, there is room for improvement with NS2's audio playback and obscurance.
We're continuing to look into ways to improve sound propagation in the 3d-space of NS2's maps.
Regarding 3D audio with stereo output:
Reconstructed 3D audio doesn't work for everyone. I found myself trying some company's (forgot the name) 3D-from-headphones algorithms at a professional broadcasting convention ; my colleague said he was impressed with how accurate the positioning sounded to him, but I heard pretty much nothing but stereo.
The engineer who was presenting the demo explained that, to achieve 3D positioning, they studied how sounds bounce off the average person's face and ears into their eardrums by placing small microphones into a test subject's eardrums, and studying the delay between each ear, looking at dephasing effects and other things I don't remember / didn't understand. Then, to create 3D audio, they could replicate the transformations they measured and trick the brain into hearing the sound "coming from" a specific spot.
The problem is that everybody looks different, so they had to tweak their model for the average human head. So while it should work okay for most people, it will work perfectly for some and not that much for others.
In my case, my ears stick out a bit from my head, maybe that's why it doesn't work well for me ? Anyway, until I make my own model with tiny microphones in my ears, I'll never know what I'm missing :P
I work with audio and video all the time, and yep, everyone at work seems to agree that a small visual glitch is acceptable, but the tiniest audio glitch is very distracting
Also try this one, heck it might work for you. It's a bit different, but the idea is the same using two microphones to calculate the 3D positioning
https://youtube.com/watch?v=8IXm6SuUigI
@McGlaspie, that explains the awful 3D positioning of the sounds, but I do believe it has gotten worse with a NS2 version from about a year ago. It was the first thing that I noticed when I started up NS2 again a few weeks back. Your post also is quite a good insight as to why "sound" engines are vastly underestimated in terms of how much actual raw processing power it requires. I mean CPU's have become al lot faster, compared to 1998-2000, yet in software simulation it still requires quite a bit of CPU cycles, even today...
Thanks for explaining it btw, very interesting read!
Roughly speaking, you'd trace to everything in the cache as "in room" (within n units on nav mesh or within LoS.) on every frame. 1/50th of everything not known to be in room (not in the "in room" cache") would be handled every 50th frame. Things would only move from "not in room" to "in room" when processed at this delayed rate. Sounds would be played using the current data whenever they occur, which for far away things could be stale by up to 50 frames.
The problem with a cached and/or deferred update approach is the Db level of sounds would suddenly jump. So, the volume of a sound event would look something like (yay ascii "graph"): Which would be pretty jarring. Also, FMOD doesn't provide a straightforward method to dynamically (via API calls) change properties of an event at run-time. The entire library is designed so the sound event definitions themselves dictate that behavior, and not external code controlling FMOD. As a result, using cached data (external to FMOD) to augment the properties of an event isn't a viable option. And attempting to dynamically control Sound Events externally to FMOD has a huge performance cost (I suspect due to FMOD's internal caching and channel management routines). This issue is a big part of why my previous attempts at augmenting what we have were not performant.
I'm not sure what the granularity of sound events is, but what if you only use the new data for new events? More latency again, but no jumps in mid sound. How does this work currently? If you are getting closer to something mid-sound-event, does it actually get louder mid-sound-event?
(Also, hopefully the deferred sounds would be all far away enough that their exact parameters 1) change slowly. 2) don't matter much.)
It's a tedious job to add these "extra layers" of occlusion, but it shows that a mapper can use the current system and provide a little extra information to improve the occlusion problem. Perhaps in the future mappers could have the ability to set the "density" of a wall, or perhaps the engine could somehow measure the thickness of a wall to determine how much it muffles/occludes the sound.
I just did:
Also the guy in the HL and Quake videos is really, really bad
Thanks for sharing the videos.
But also secondly, if it's a game where sound is just a decorative thing, whatever. In a game where audio cues mean a difference between a dead marine and dead alien, it's something worth investing almost any kind of time into. Because broken audio in NS2 means broken gameplay as gameplay actively depends on it. And you don't want a broken gameplay in a game where gameplay is literally above anything else (I mean it's a team based shooter with another layer of a strategy with commander on top, it's a delicate and complex game system). This stuff matters and it matters A LOT.