Sound in the Mise-en-jeu: Conveying Meaning through Videogames’ Mediated Space

João Pedro Ribeiro

Faculty of Engineering, University of Porto, Portugal

Miguel Carvalhais

INESC TEC / Faculty of Fine Arts, University of Porto, Portugal

Pedro Cardoso

Faculty of Engineering and Faculty of Fine Arts, University of Porto, Portugal


Mise-en-jeu is the ontological equivalent of film’s mise-en-scène. As such, mise-en-jeu is a cinematographic language through which game designers communicate. It offers designers the ability to create and shape the aesthetics of videogames’ mediated space, the space of the cinematographic presentation.1
Our prior work on mise-en-jeu focused on the visual aspects of videogames. With that in mind, starting with an analysis of mise-en-scéne, this paper provides an understanding of how sound is relevant for meaning-making through mise-en-jeu. Since videogames make use of some of motion picture’s filming techniques, we first studied practitioners and academics in the history of film, approaching videogames afterwards.
The results of this research show that sound in mise-en-jeu allows designers to provoke emotions in players and to assist those players in formulating meaning as intended by the designers. We also found that mise-en-jeu allows for the deconstruction and interpretation of the characteristics of various variables of videogames’ mediated space. Therefore it allows us to understand better the relationship between videogames as audiovisual artefacts and the potential meanings that emerge from playing them.

Keywords: Sound, Mise-en-jeu, Videogames, Design, Cinematography.

1. Introduction and Research Methods

All elements of the filmic frame convey meaning. A concept, termed mise-en-scène, is used to study and deconstruct cinematic artefacts. Mise-en-scéne is composed of five essential elements of analysis: Décor, Lighting, Space, Costumes, and Acting (Barsam and Monahan 2010). A hypothesis we test is that sound can also be part of the mise-en-scène because it also conveys meaning in the cinematic apparatus.

Mise-en-jeu exists in videogames as a phenomenological correspondence of mise-en-scène in film. It is the language of videogames’ mediated space. To analyse it, we have proposed a “framework that addresses the lack of a visual grammar in videogames” (Ribeiro et al. 2018, 86). Intrinsically, our mise-en-jeu framework studies and describes the mediation language used by videogame developers to influence their public. Such a framework provides videogame creators with the knowledge to establish and structure the artistic values in the mediated space of videogames.

If sound serves as a confirmation of theories on the visible, and as a confirmation that an area looks and sounds equal, it intrudes on a discourse which refers to a specific person. It establishes and states their singularity, distinguishing that person amongst other people (Doane 1980, 52). Within discussions regarding sound representation, realism challenges intelligibility. When sound representation claims are recognized, speech intelligibility disappears in many cases. The issue is like the connection between foreground speech and background sound. For instance, if key characters have a conversation in a crowded virtual space, the mimetic strength of the crowd’s uproar is lowered in support of the intelligibility of the pair’s conversation. Such trade-offs, done in support of intelligibility, show the theoretical shift in realism (52-3). Mediated space’s sound acts under variability between realism limits: “that of the psychological (or the interior) and that of the visible (or the exterior).” The reality of a character, from within the interior domain of a character, is a fact affirmed through the advent of sound (ibid.).

Music tends to correspond to the mood and activity of the frame. Once mimesis is not contemplated on the verifiable and phenomenologically described world, that mimesis is transferred to the degree of corresponding contrasting tangible layers of signification (55). When linked up, sound and picture suggest the humanisation of the player character, as they can look and sound similar to real people, and express similar emotions. Understanding of the interior existence of a character may be set by the breadth and extemporization of their dialogue, improved by the florid approaches of music and sound (ibid.).

Methods of audio mixing turn sound into the carrier of significance – and significance is not encompassed in the concept of the visible. The philosophical reality of the soundtrack comprehends a richness that goes beyond visual motives. As the ear is the tissue that opens our interior – not invisible, nevertheless “unknowable within the guarantee of the purely visible” (56).

To test our hypothesis that sound is an element of mise-en-scéne and, consequently, of mise-en-jeu, we used ethnography-derived methods, framed by bibliographical research (Roberts 2002, 6). As for bibliography, we resorted to “documents of life” (Plummer 1983),2 specifically existing research interviews previously-published by other authors. We adopted this method as there are several published books and video recordings with filmmakers and videogame designers that have been overseen by experts. Some of the interlocutors are part of creative teams in their medium. In contrast, others are academics, but all are references in their respective fields, legitimising their claims. Additionally, we resorted to a conventional bibliographical investigation of academics which are experts on the areas of study.

One characteristic of ethnographic studies is that they examine few cases (Flick 2002). We have this limitation in consideration in our research, opting for studying four references in each medium, rather than focusing on more authors of lower renown. Despite its size, we believe that our sample is relevant since instead of qualitative in-depth interviews of a random population, we resorted to key informants. They expand our understanding as researchers and provide “rich detail about a phenomenon, community or case” (Cossham and Johanson 2019). We observe limitations in reading archived materials and regarding them as transmissible knowledge. All people have their biases and points of view (Green and Bloome 1997, 197). We acknowledge that while bias is unsought, it is also unavoidable, and we tried to detect and correct it whenever possible.

2. Research Results

2.1 Film Studies

2.1.1 Walter Murch – Film Editor and Sound Designer

According to Walter Murch, whenever in a film’s music comes in the audience experiences an “emotional equivalent of a cutaway”. He argues that sound and music function as a mixer of specific emotions and allow filmmakers to plan and provoke those emotions in the audience. When there is no music, it is because filmmakers use that absence as a device to make a specific and clear comment on something, without manifesting it through the visual discourse (in Ondaatje 2002).

Murch claims that the audio mixing stage is the final phase in which filmmakers can solve an insoluble problem. A specific mix of music and sound can provide a solution for a problem that could not be solved in the production and post-production phases of the visual discourse of the film. It is an essential part of the filmmaking process. Each stage of a film’s production ends with unresolved issues that are resolved in the next stages (ibid.).

As an illustration of the correct usage of music and sound, Murch provides us with a scene: “Look at this gun! The gun hits the ground, and then the music finally comes in” (ibid.). He states that, in the scene, music is a representative and conductor of the emotion created previously, in place of the apparatus for the creation of emotion itself. In The Godfather (1972), music was used in this sense. Murch believes this method elicits feelings in the spectator that are more real since they stem from the direct contact spectators establish with the scene itself. Spectators’ sentiments concerning a scene are directly related to the mix of image and sound, not the genre or type of music (ibid.).

Murch concludes stating that, based on his personal experience, the majority of films use sound and music in the same sense that professional sportspeople use synthetic corticosteroids. “It gives you an edge, it gives you a speed, but it’s unhealthy for the organism in the long run”, meaning that it must be used in a planned and coherent manner (ibid.).

2.1.2. Hans-Jürgen Syberberg – Film Director

Hans-Jürgen Syberberg states that he learned to exercise the usage of sound and video differently, addressing the audience through the sonic environment (Cardullo 2017, 103). In Our Hitler (1977), Syberberg introduces Hitler as a consequence of the cinematic era, making a limited mention to Siegfried Kracauer’s analogous opinion of the Führer in the book From Caligari to Hitler (1947).3 As an analogy, at the beginning of Our Hitler, a child walks onto a set exhibiting a projection from The Cabinet of Dr. Caligari (1919). Also projected are other German films. Confronted with this backdrop, the child puts a toy puppy, which has the face of Hitler, into a crib. As the infant puts the Hitler-puppy to bed, we hear Hitler’s voice out from the record of a speech delivered in 1932 (Cardullo 2017, 109). Here, we can observe the usage of sound to convey meaning in the film which the image alone could not convey.

When confronted by the argument that his monologues would function better in theatre due to a supposed grander dependence on words in place of visuals, Syberberg contends that the interviewer is mistaken. To ground his dispute, Syberberg references Apocalypse Now (1979), alleging that Coppola efficiently used sound in the film since it surrounds the viewer, making it essential in conveying meaning in the film.4 Syberberg wanted to use those technologies in Our Hitler, in order to surround the viewer with layers of sound, something that would allow him to communicate his message better. Elements of sound analogous to the surround effect in Apocalypse Now exist in Our Hitler, but as an undeveloped concept since Syberberg did not have the budget to use the same technologies (Cardullo 2017, 111).

Syberberg had the chance to watch Apocalypse Now before the sound was mixed. At that stage, it consisted only of what he calls noise. He argues that the film and its visual aspects only worked after the sound became multi-dimensional and fully developed. The contrast between these two stages revealed “how sound technology can change an entire film.” Syberberg concludes the interview by stating that he believes that sound design in film should have more detail. “The future of film lies in the development of sound technology” (ibid.).

2.1.3. Philip Halsall – Film Researcher

Philip Halsall starts by quoting David Lynch: “Every time I hear sounds, I see pictures. Then, I start getting ideas. It just drives me crazy” (in Halsall 2002), and “Films are 50 percent visual and 50 percent sound. Sometimes sound even overplays the visual” (Cable 1998).

David Lynch’s film-philosophy is like so, and such is reflected in his films, according to Halsall. After Eraserhead (1977) targeted awareness of visuals and audio in films, Lynch has been trying to deliver some of the most visually and sonically compelling films. The abstract nature of sound reinforces Lynch’s vague storytelling method. However, his rhythms and passages of music appear to command the mood he wants (Halsall 2002).

Music, singing, speech and abstract sounds all come together in and out the screen to build a structure that is used by Lynch to enhance the story and mood of his films. Significant meaning can be transformed or manipulated with the use of specific sounds. Any effect can also be reversed with the use of auditory cues. Audio assists the image, and image assists audio. Nevertheless, they enter Lynch’s films in a combination rarely seen in the works of filmmakers and create elegance and suspense. Lynch considers sound to be an invisible protagonist, integral to the story, and an unseen voice which creates confusion in the background of his films as much as they elucidate it (ibid.).

Lynch arguably came to define the function of music and sound in the production of contemporary films. When making a film, his construction of concepts and his ability to produce unique case studies of sound design has now become his signature. Through his research, Halsall came to understand that Lynch’s unwavering desire for lavish audio-visuals has made it possible for many sonically engrossing films to exist. Lynch’s works are recognized in comparison to what the viewer feels visually and sonically, with their unique approach to the role of sound and music (ibid.).

2.1.4. James Wierzbicki – Film Researcher

James Wierzbicki states that, as with most moviemakers, Alfred Hitchcock’s works include both the sounds which the characters hear and the sounds which only the physical members of the audience can hear. Film-makers have defined the sounds of the first type as source music (since the origin is visible, or at a minimum implicated, in the scene). However, often it is called diegetic music by academics (Wierzbicki 2012, 6) because in the cinematic narrative, or diegesis, it seems to occur – simply put, the characters in the film can hear the music the audience hears (Rea and Irving 2015, 323).

In the movie industry, the second kind of music has been referred to as underscore from the 1930s onwards (since such music does not contribute to or is external to film stories/ diegesis); this music is referred to, in the literature, as non-diegetic or extradiegetic. Be it source music or underscore, diegetic or extra-diegetic, the type of musical content across Hitchcock’s films is not relevant to his sound design. In terms of sonic style, it is what Hitchcock does with those sounds that counts (Wierzbicki 2012, 6).

Hitchcock’s use of underscore stemmed from assumptions on when to apply it or not, and he was a typical mainstream director in these decisions. There are exceptions to such a blanket statement. Lifeboat (1944) and The Birds (1962) do not demonstrate any underscore and notable segments in several of his films – for instance, the continuous exploration of an ominous wind turbine in Foreign Correspondent (1940), the scene of an aeroplane in North by Northwest (1959), and the dramatic police action in Family Plot (1976) – lack the mood highlighting dramatic music which is the hallmark of the well-known classical style film. In most cases, Hitchcock’s use of underscore remains close to conventional films in a classic style (Wierzbicki 2012, 7).

Conversely, Hitchcock’s use of diegetic sounds was progressive and unorthodox. Alongside Blackmail (1929), the most famous of his pre-Hollywood films are those in British-Gaumont’s crime drama sextet. Of these films, four of them – The Man Who Knew Too Much (1934), The 39 Steps (1935), Young and Innocent (1937), and The Lady Vanishes (1938) – include diegetic sounds which are not only related to the stories but also give viewers indicia of how the plots will proceed. Perhaps nowhere else in Hitchcock’s works is diegetic music so central to the story as in The Man Who Knew Too Much, where a climactic occurrence of an extensive cantata acts as a catalyst for attempted murder (Wierzbicki 2012, 7).

As described by Jack Sullivan, Hitchcock’s definition of music:

encompassed street noise, dialogue (especially voice-over), sounds of the natural world (…) sonic effects of all sorts (…) silence, the sudden, awesome absence of music, capable of delivering the most powerful musical frisson of all (2006, xv).

Hitchcock would have accepted this. Before his death, an interviewer indicated that he had considered music as just yet another sonic item, whose inclusion in a movie needed thoughtful consideration (Wierzbicki 2012, 8). Hitchcock himself said that “when you put music to film, it’s really sound; it isn’t music per se” (Counts 1980, 29).

Many academics are likely to reverse this assertion and say that if we put sounds in a film, it is music; it is not sound by itself anymore. Such assumptions encourage parentheses into a musical genre which has been identified as musique concrète (Wierzbicki 2012, 9).

2.2 Game Studies

2.2.1. Scott Martin Gershin, Russell Brower, Tommy Tallarico, and Pedro Seminario – Videogame Sound Designers

Talking about the sounds from Space Invaders (1978), specifically the one that starts when the time to clear the level is nearly up, Scott Martin Gershin, Russel Brower, Tommy Tallarico, and Pedro Seminario describe it as minor in length but grand in effect. They argue it helps to create tension in players because it indicates that they are about to run out of time. In practice, the videogame’s sound designers created a song, which replays quicker as time goes by, with the rhythm adjusted to change along with the pitch. This increases the pace of the videogame since the sound’s rhythm is synchronised to the human heartbeat. When the rhythm increases, so does the player’s heartbeat, creating a sense of urgency. The authors propose that, with this game, a bit of psychoacoustics, and the sound of instability, found their way into game design (2017).

Referring to Duck Hunt (1984), Gershin et al. explain that videogames taunt the player through sound. They argue that the sound was designed with the express intention to make players want to fire into the dog character when failing. As players shoot as much as they can, the sound of progress and success emerges. As such, sound can often be the ideal tool in convincing players that they were successful. It also happens if they lose, through the use of dissonant sounds. These are sounds used to mediate empathy within play. It is the psychology of giving players that sentiment, as they start low and go high (2017).

The sounds of collecting and losing coins in Sonic the Hedgehog (1991) are like those in slot machines, replicating the psychological effect of winning or losing real money. Since that effect is efficient, videogame designers have tried to imitate that sound in various ways, when creating sound effects which are reproduced when players gather an item. Sound effects like those are the hardest to invent from a sound design standpoint but are some of the most effective in the mediated space (Gershin et al. 2017).

Final Fantasy (1987-Present) is a videogame series that’s known for its music. Its soundtrack and sound effects give players the feeling of celebration and constant triumph. To achieve this, designers used instruments like fanfare trumpets because of their emotional affiliation with festivity (ibid.).

In Metal Gear (1987) a siren-like sound plays when players get spotted by enemy non-player characters. Hideo Kojima – the game’s designer – planned this sound effect so that the decay cuts short, sending players into a state of panic. This effect proved to be efficient, and it is still present in videogames such as Metal Gear Solid V: The Phantom Pain (2015), which has its sound design rooted in past entries in the series (Gershin et al. 2017).

The authors conclude by recommending that, when designing a sound, it is best to try to figure out that sound’s purpose, because it will have a meaning to the player and it will mediate player experience. Along with visual elements of a videogame, music and sounds designers can tinker with the player’s feelings (ibid.).

2.2.2 Darren Korb – Sound Designer and Composer for Games

The composer of Bastion (2011), Darren Korb, explains that the idea for the videogame’s dynamic voiceover from a narrator came from experimentation with storytelling strategies, to give players a sense of what is going on and tell them a story, without interrupting regular play by placing storytelling elements. Diablo II (2000) is a well-beloved videogame to Korb. However, he admits that he did not want to copy its methods of storytelling, in which players have to read through a “giant wall of text for five minutes” in order to know the characters’ backstory. Korb explains that he took Diablo II as a bad example of how he wanted to tell a story and, to avoid making that mistake, he used sound to integrate everything (2016).

He admits that “music led, for most of the early part of the process – the design, the art, and the writing – in terms of tone”, having a substantial influence on the project. He mixed the sound of catapults in most of the sound effects, arguing that such a mix made them all more remarkable. He made use of such mixing technique for most of the weapon sounds, since, when it came to the sound effects, his main goal was to give feedback to players (ibid.).

His secondary goal was not to have any sounds that players could find obnoxious. Some sounds are going to be heard by players countless times throughout play sessions, so he wanted to ensure they could hear them numerous times without growing angry or becoming emotionally detached from the game. For this, he listened to each sound effect in a loop and made sure that he did not get aloof while hearing it (ibid.).

The narration was the most critical feature “because it conveys story and context”. So, players must hear the narrator front centre. As such, the narrator is in the middle of the stereoscopic image and plays at a high volume. The music is next when it comes to loudness since it does “a lot to convey tone and the emotion of the moment”, and there are numerous “important moments in the game involving music and then sound effects” are there for giving players feedback. However, he believes that sound effects are not as significant as narration and music (ibid.).

Korb used automation to highlight narration. That is why he used a technique called ducking, in which most of the audio’s volume decreases, but the voice stays on top (Marks 2012, 324). That helped the narration pop out and attained more player immersion. He summarises this by stating that the entire purpose of this whole process and all the sound layers “is to enhance player immersion” (Korb 2016).

2.2.3. Bob Rehak – Game Researcher

Bob Rehak tells us that Spacewar! (1962) defined a group of critical functional elements of avatars which have since been found in most videogames. One of functional elements is the existence of player avatars with limitations that deliberately shape the diegetic narrative which is visible on a screen, regarding avatars’ physical laws which are designed to simulate real-life – that is, not only the meaning of the images of the videogame but also the sounds, concerning players’ realities. Audiovisuals portray guiding principles and conditions that regulate play (2003, 110).

The arcade’s commercial space, which had dark parts that were loud due to electronic sounds and graphical stroboscopic explosions, resembled a real version of those videogames. Arcade games had thematic features which were seen as satirical representations of themselves. Such claustrophobic sonic informant diegesis is observable in videogames like Asteroids (1979), Galaxian (1979), Centipede (1981), and Defender (1981) (Rehak 2003, 114-5).

Rehak refers that Myst (1993) used “ray-traced scenes of near-photographic quality.” However, since Myst emphasised spatial exploration over linear plot development, ambient sound effects were used to enhance the players’ feeling of immersion: “wind whistling through trees, waves washing up on shore, and mechanical objects that whirred and clicked” (2003, 116), in addition to the still frames that distinguished the topography of the island. It used non-diegetic music as well, which is comparable to a movie soundtrack to create tension and suggest proximity to hints (ibid.).

In Quake (1996) the avatar’s somatic form,5 a material “vulnerability conveyed through multiple codes of representation”, overshadowed the visual qualities of the avatar. Players listened to the sounds of their footfalls and the breathing of their avatars. Game physics made player avatars bounce off walls and crouch to enter small places. The force of gravity stopped the avatars from jumping too far, which caused fatal falls from high distances. A shake of the camera combined with the sound effect of a groan or cough and a subsequent health points loss indicated impact injuries. Players saw the blood splash of themselves as avatarial damage intensified; death was implied by a collapsing of the avatarial camera which laid motionless – nevertheless transmitting a visual and acoustic output to the mediated space – until resurrected with a button click (2003, 118).

The situational camera uncovers the intricacies of reality previously unimaginable as video material. Since the technique can convey almost any aspect of actual human life, its creation can herald a conceptual film where the camera exposes the human psyche, not vaguely, but genuinely in terms of not only image but also sound. It allows players to observe people both as others see and hear them, and as they see and hear themselves (Brinton 1947, 365).

2.2.4. Mark Grimshaw, Siu-Lan Tan, and Scott D. Lipscomb – Game Researchers

In Playing with Sound: The Role of Music and Sound Effects in Gaming (2013), Mark Grimshaw et al. examine the broad array of functions of sound and music in videogames. The authors noted that, except for videogames such as those in the music game genre, the majority of the music in videogames is pre-recorded and emulates film music, being a key element but not often the main focus. Alongside visuals, audio in cinematics helps to ensure game progression by hiding the loading of game levels into RAM and is used as non-diegetic music across gameplay (297).

In comparison with film scores, videogame scores are called non-diegetic. They contrast with its sound effects, which are diegetic and linked to the player’s actions and interpretation of the game world. Non-diegetic music simulates film’s underscoring. The technique of underscoring is used to express: a specific atmosphere (range, position through time, and energy); inner nature and character’s thoughts or emotions by using leitmotifs (musical themes connected to an object); and story (to focus attention, tempo, and pacing) (ibid.).

Videogame scholars use the terms diegetic and non-diegetic in ways that film researchers do not, illustrating the multi-speciality of most of videogame sound, which is conducted by the need for interactions with the game world. Videogame sound researchers have drawn up combinations of the words non-diegetic and diegetic to represent an increased number of gaming sound functions in contrast with film, considering the interactivity and the affordances of videogames and the possibilities resulting from these (ibid.).

Music can also influence the interpretation of players. When experiencing the designed-in thrilling soundtrack and sound effects of guns and screaming, both men and women found DOOM (1993) more aggressive than when played with no sound (Grimshaw et al. 2013, 307). Similar studies have proven that players’ interpreting of a game is affected by music. Jørgensen’s (2008) survey respondents did not know in advance that the audio would be silenced halfway into their gaming sessions. Their response to this change was that they felt it was challenging to play while lacking audio. Players thought they were losing control. One participant felt they were utterly clueless, while another likened this to losing their leg. They pointed out how the experience felt of inferior engagement without any audio. In Counter-Strike (2000), participants noticed that sound often played a significant role in terms of how well they could perform (Jørgensen 2008, 166-7).

The sophistication of sound effects and music evolved simultaneously with sound and memory innovations in videogames; simultaneously, players became more involved in interacting with sound. In contemporary videogames, audio is multi-channel, real-time, and 3D, unlike the plain mono chirps of early videogames like Pong (1972) (Grimshaw et al. 2013, 308).

That growing engagement allows players to develop their efficiency, whether in just playing the game with no pre-established goals, finishing levels quicker, achieving a higher rating or joining the winning squad. In audio-only or music videogames, the success of plays requires the capacity to understand and the sound effect source in the sound field. In first-person shooter and survival horror videogames, mixing acoustic and visual information processing to identify the source of danger is vital.

The consequences of sound in the performance of players and the contextual elements of play may depend on several variables, such as the structural features and rules of a videogame, the setup of the devices which play the game’s sounds, e.g. headphones, the connection between the game events and the actions of players, and the skills of the players (ibid.).

3. Conclusions

3.1 Summary of Contributions

3.1.1 Methods

In this paper, we adopted the use of accounts of culture as research methods, including biographical and bibliographical explorations, to the study of phenomena in the fields of film and game studies. Through research interviews, we used documents of life to validate our hypothesis.

With few cases and no direct contact with the sample units, this study focused on references of the media. The sample contains statements from critical actors, which improve our knowledge of the phenomena. We detected constraints in the interpreting of archived items – the authors have preferences and perspectives. Such was acknowledged, and we only chose to include material that was consistent with most of the canonical bibliography in both fields of study.

3.1.2 Summary of Findings: Mise-en-Scène

This study hypothesises that sound can convey meaning in the mediated space of videogames and that it is a critical component in the mise-en-jeu. However, first, we proposed the verification of sound as part of film’s mise-en-scène.

Sound and music serve as combiners of different emotions, helping filmmakers to prepare and evoke feelings in the viewer. When there is no involvement of music, that is because the creators use the absence as a tool to create a precise and direct statement about something, without addressing it visually. Whenever a film’s score arrives at the viewer, it feels like an emotional version of a cutaway – the interruption of continuity.

The audio mixing and mastering phases are the final stages in which filmmakers may overcome an insoluble problem. A combination of music and sound may offer an answer to an issue that was not addressed in production and post-production. It is an integral aspect of the movie process. Music is a voice and a conductor of emotions. Spectators’ emotions regarding a scene are primarily linked to the combination of picture and sound, not the form or style of the score.

The use of sound can express something in a movie that the image alone does not express. Innovations in sound processing and design can transform a whole film. Often the audio magnifies the image. A substantial context may be shaped or distorted by utilizing different sounds. With the use of auditory stimuli, any effect may also be reversed. Audio supports the image, and the image helps the audio. Sound is an unseen narrator, a vital part of the plot, and an intangible voice that generates ambiguity in the context of a film as often as it elucidates it.

3.1.3 Summary of Findings: Meaning in Videogames’ Mediated Space

After confirming that sound is crucial in the communication and elicitation of emotion in film, we also confirmed that sounds are essential in the conveying of meaning through videogames’ mediated space.

We confirmed that sounds like music, voice and sound effects are essential and crucial in provoking emotions in players and assisting them in formulating meaning as intended by videogame designers, as sound and music does “a lot to convey tone and the emotion of the moment” (Korb 2016).

Diegetic sounds relate to the visible and audible virtual worlds and the characters that inhabit them. Extradiegetic sounds concern background music and other elements that are not part of the action. Both diegetic and extradiegetic sounds are crucial in establishing and maintaining player immersion and giving legitimacy to the videogame’s magic circle – a space “in which normal rules of reality and ordinary life are suspended when one participates in games (or rituals)” (Nam 2019, 48).6

Players’ judgment of the videogame is assuredly influenced and mediated by sound. This mediation is often made in connection with other – visual – elements of the mise-en-jeu, as both are always present, being that their absence is an intended statement on itself, be it by the designers or the players, as the latter can choose to modify the volume of the sound output, effectively mediating the mise-en-jeu themselves. While such connections are sure, our research established that visuals do not always command the majority of the tone of the artefact, with examples such as Bastion (2011) in which the intended mood of the visuals derived from sound.

Sound in the mise-en-jeu will offer players a sense of what is happening and tell them a story, without disrupting regular play. Sound effects often provide feedback to players. Audiovisuals reflect driving concepts and criteria regulating games. Visual and auditory feedback on the mediated space can indicate several narrative elements that are not explicitly communicated to the player by text messages or narratives.

Much of videogame scores are pre-recorded and mimic film music, in that they are a crucial aspect but not always the emphasis of the mediated space. Alongside graphics, audio in cinematics helps ensure the advancement of the game. The complexity of sound effects and music developed around the same period as sound and memory developments in video games. Around the same period, players became increasingly interested in the interaction with sound. This increasing dedication enabled players to improve their performance. The implications of sound in the success of players are reliant on a multitude of variables.

3.2 Future Work

As mise-en-jeu enables the dissection of the characteristics of the mediated space, defining variables emerging from sound will provide a broader framework. Our study also suggests that definitions of mise-en-scène, such as those proposed by Richard Meran Barsam & Dave Monahan (2010) and Louis Giannetti (2014), should include sound in their listings of components, rather than leaving sound as a side remark of their texts. By consequence, our definition of the mise-en-jeu must also include sound as a component and with verifiable variables. Sound-related variables have not been determined in the current study since they are out of scope. However, they must be addressed in future studies.

Each filmmaker, designer, and academic that we studied agreed that sound is used to make a “clear comment on something” (Murch cited in Ondaatje 2002). Such comments and how they can be analysed systematically must be described in the framework. Such description must be determined in future studies. Suggestions include the consideration of diegetic and non-diegetic sound as two possible distinct variables, as well as a separation of underscore, sounds effects, and narration.

Our data support our hypothesis, but one finding was not expected — the correlation between time and sound and its effect on player perception. While many studies have been conducted in the field of physiology, regarding the time-sound connection, we propose that their findings be discussed further, so that a global perspective on the biological effect of aesthetics in a player’s reaction can be transposed to an analysis of player sensation and attitude towards the mise-en-jeu.


1The concept of the mediated space in videogames was introduced by Michael Nitsche in his five planes theory for the analysis of videogame spaces (2008), and was interpreted and described by Sercan Sengün as consisting “of the visual outlet of the game and mostly breeds cinematic and visual studies” (2015, 186-7).

2At the heart of personal document research is the life story – an account of one person’s life in his or her own words. Life stories come through many blurred sources: biographies, autobiographies, letters, journals, interviews, obituaries. They can be written by a person as their own life story (autobiography) or as a fiction by themselves; they can be the story coaxed out of them by another, or indeed their `own story’ told by someone else (as in biography). They can exist in many forms: long and short, past and future, specific and general, fuzzy and focused, surface and deep, realist and romantic, ordinary and extraordinary, modernist and postmodernist. And they are denoted by a plethora of terms: life stories, life histories, life narratives, self stories, `mystories’, autobiographies, auto/biographies, oral histories, personal testaments, life documents (Plummer 2001, 18-9).

3Siegfried Kracauer argues that:

through an analysis of the German films deep psychological dispositions predominant in Germany from 1918 to 1933 can be exposed dispositions which influenced the course of events during that time (Kracauer 1947, v)

and which had “to be reckoned with in the post-Hitler era” (ibid.).

4 Apocalypse Now was one of the first movies to use the Dolby Stereo 70 mm Six Track technology. (Kerins 2010, 1-9).

5 The hypothesis of the somatic marker, proposed by António Damásio and associated researchers, suggests that emotional processes direct actions, particularly decision-making. Somatic markers are emotionally linked feelings in the body (Arias et al. 2020, 202). Luke Hockley brought the term somatic from the neurosciences into cinema, defining somatic cinema as “the relationship between audience and films. More specifically, it examines how the body of the viewer and the cinema screen itself are interrelated sites of meaning” (2013, 1).

6 Johann Huizinga originally introduced the concept of the magic circle in Homo Ludens: A Study of the Play-Element in Culture (1949).


Arias, Juan A., Claire Williams, Rashmi Raghvani, Moji Aghajani, Sandra Baez, Catherine Belzung, Linda Booij, et al. 2020. “The Neuroscience of Sadness: A Multidisciplinary Synthesis and Collaborative Review.” Neuroscience and Biobehavioral Reviews 111 (April): 199-228.

Barsam, Richard Meran, and Dave Monahan. 2010. Looking at Movies: An Introduction to Film. W.W. Norton & Co.

Brinton, Joseph P. 1947. “Subjective Camera or Subjective Audience?” Hollywood Quarterly 2 (4): 359-66.

Cardullo, R. J. 2017. Hans-Jürgen Syberberg, the Film Director as Critical Thinker: Essays and Interviews. SensePublishers.

Cossham, Amanda, and Graeme Johanson. 2019. “The Benefits and Limitations of Using Key Informants in Library and Information Studies Research.” Proceedings of RAILS – Research Applications Information and Library Studies, 2018, Faculty of Information Technology, Monash University, 28-30 November 2018. Information Research 24 (3).

Counts, Kyle B. 1980. “The Making of Alfred Hitchcock’s The Birds: The Complete Story Behind the Precursor of Modern Horror Films.” Cinefantastique 12 (2).

Doane, Mary Ann. 1980. “Ideology and the Practice of Sound Editing and Mixing.” In The Cinematic Apparatus, 47-60. London: Palgrave Macmillan UK.

Flick, Uwe. 2002. “Qualitative Research – State of the Art.” Social Science Information 41 (1): 5-24.

Gershin, Scott Martin, Russel Brower, Tommy Tallarico, and Pedro Seminario. 2017. “Classic Video Game Sounds Explained by Experts (1972-1998) | Part 1.” WIRED. 2017.

Giannetti, Louis D. 2014. Understanding Movies. 14th ed. Pearson.

Green, Judith, and David Bloome. 1997. “Ethnography and Ethnographers of and in Education: A Situated Perspective.” In Handbook of Research on Teaching Literacy through the Communicative and Visual Arts, 181-202. Lawrence Erlbaum Associates, Inc.

Grimshaw, Mark, Siu-Lan Tan, and Scott D. Lipscomb. 2013. “Playing with Sound: The Role of Music and Sound Effects in Gaming.” In The Psychology of Music in Multimedia, 289-314. Oxford University Press.

Halsall, Philip. 2002. “The Films of David Lynch: 50 Percent Sound.” The British Film Resource. 2002.

Hockley, Luke. 2013. Somatic Cinema: The Relationship between Body and Screen – a Jungian Perspective. Taylor & Francis.

Huizinga, Johan. 1949. Homo Ludens: A Study of the Play-Element in Culture. Routledge & Kegan Paul.

Jørgensen, Kristine. 2008. “Left in the Dark: Playing Computer Games with the Sound Turned Off.” In From Pac-Man to Pop Music: Interactive Audio in Games and New Media, edited by Karen Collins, 1st Ed, 163–76. London: Routledge.

Kerins, Mark. 2010. Beyond Dolby (Stereo): Cinema in the Digital Sound Age. Indiana University Press.

Korb, Darren. 2016. “Build That Wall: Creating the Audio for Bastion.” GDC. 2016.

Kracauer, Siegfried. 1947. From Caligari to Hitler: A Psychological History of the German Film. Princeton University Press.

Marks, Aaron. 2012. The Complete Guide to Game Audio: For Composers, Musicians, Sound Designers, Game Developers. CRC Press.

Monster Cable. 1998. “The Monster Meets Filmmaker David Lynch.” LynchNet, 1998.

Nam, Su Hyun. 2019. “Rules of Videogames and Controls in Digital Societies.” In Videogame Sciences and Arts. VJ 2019. Communications in Computer and Information Science, 46-56. Springer, Cham.

Nitsche, Michael. 2008. Video Game Spaces: Image, Play, and Structure in 3D Game Worlds. 1st ed. MIT Press.

Ondaatje, Michael. 2002. The Conversations: Walter Murch and the Art of Editing Film. Bloomsbury Publishing PLC.

Plummer, Kenneth. 1983. Documents of Life: An Introduction to the Problems and Literature of a Humanistic Method. G. Allen & Unwin.

Plummer, Kenneth. 2001. Documents of Life 2: An Invitation to a Critical Humanism. Sage Publications.

Rea, Peter W., and David K. Irving. 2015. Producing and Directing the Short Film and Video. Taylor & Francis.

Rehak, Bob. 2003. “Playing at Being: Psychoanalysis and the Avatar.” In The Video Game Theory Reader, 103-27.

Ribeiro, João P., Miguel Carvalhais, and Pedro Cardoso. 2018. “Mise-En-Jeu: A Framework for Analysing the Visual Grammar of Platform Videogames.” In VJ2018 — 10th Conference on Videogame Sciences and Arts, edited by Miguel Carvalhais, Pedro Amado, and Pedro Cardoso, 86-108. Porto: i2ADS – Research Institute in Art, Design and Society, University of Porto, Faculty of Fine Arts. from VJ2018-Proceedings-full-5.pdf.

Roberts, Brian. 2002. Biographical Research. Open University Press.

Sengün, Sercan. 2015. “Why Do I Fall for the Elf, When I Am No Orc Myself? The Implications of Virtual Avatars in Digital Communication.” Comunicação e Sociedade 27 (0):181.

Sullivan, Jack. 2006. Hitchcock’s Music. Yale University Press.

Wierzbicki, James Eugene. 2012. Music, Sound and Filmmakers: Sonic Style in Cinema. Routledge.


Apocalypse Now. (1979)., Directed by Francis Ford Coppola. United States: United Artists.

Blackmail. (1929)., Directed by Alfred Hitchcock. United Kingdom: Wardour Films.

Eraserhead. (1977)., Directed by David Lynch. United States: Libra Films International.

Family Plot. (1976)., Directed by Alfred Hitchcock. United States: Universal Pictures.

Foreign Correspondent. (1940)., Directed by Alfred Hitchcock. United States: United Artists.

Lifeboat. (1949)., Directed by Alfred Hitchcock. United States: 20th Century Fox.

North by Northwest. (1959)., Directed by Alfred Hitchcock. United States: Metro-Goldwyn-Mayer.

Our Hitler. (1977)., Directed by Hans-Jürgen Syberberg. West Germany; France; United Kingdom: Omni Zoetrope.

The 39 Steps. (1935)., Directed by Alfred Hitchcock. United Kingdom: Gaumont-British Picture Corporation.

The Birds. (1962)., Directed by Alfred Hitchcock. United States: Universal Pictures.

The Cabinet of Dr. Caligari. (1919)., Directed by Robert Wiene. Weimar Republic: Decla-Bioscop.

The Godfather. (1972)., Directed by Francis Ford Coppola. United States: Paramount Pictures.

The Lady Vanishes. (1938)., Directed by Alfred Hitchcock. United Kingdom: United Artists.

The Man Who Knew Too Much. (1934)., Directed by Alfred Hitchcock. United Kingdom: Gaumont-British Picture Corporation.

Young and Innocent. (1937)., Directed by Alfred Hitchcock. United Kingdom: Gaumont Film Company.


Atari. 1972. Pong. Atari.

Atari Inc. 1981. Centipede. Atarisoft.

Atari Inc. 1979. Asteroids. Taito.

Blizzard North. 2000. Diablo II. Blizzard Entertainment.

Cyan. 1993. Myst. Brøderbund.

id Software. 1993. DOOM. GT Interactive.

id Software. 1996. Quake. GT Interactive.

Kojima Productions. 2015. Metal Gear Solid V: The Phantom Pain. Konami.

Konami. 1987. Metal Gear. Ultra Games.

Namco. 1979. Galaxian. Midway Games.

Nintendo R&D1. 1984. Duck Hunt. Nintendo.

Russell, Steve. 1962. Spacewar! Steve Russell.

Sonic Team. 1991. Sonic the Hedgehog. Sega.

Square; Square Enix. 1987-Present. Final Fantasy. Square; Square Enix; Nintendo.

Supergiant Games. 2011. Bastion. Warner Bros. Interactive Entertainment.

Taito. 1978. Space Invaders. Midway.

Valve. 2000. Counter-Strike. Sierra Studios.

Williams Electronics. 1981. Defender. Atari, Inc.