audioopy.ipus module

The search for IPUs

Definition

IPUs - Inter-Pausal Units, are defined as sounding segments surrounded by silent pauses of more than X ms. They are time-aligned on the speech signal. IPUs are widely used for large corpora in order to facilitate speech alignments and for the analyses of speech, like prosody.

The search for IPUs is performed on a channel of a WAV audio file.

Overview of the detection method

The algorithm and its settings in a nutshell

fix a window length to estimate rms (default is 20 ms);
estimate rms values on the windows and their statistical distribution;
fix automatically a threshold value to mark windows as sounding or silent - this value can be fixed manually if necessary;
fix a minimum duration for silences and remove too short silent intervals, default is 200 ms;
fix a minimum duration for IPUs and remove too short sounding intervals, default is 300 ms;
tag the resulting intervals with # or ipu_i.

Evaluation of a threshold value

At a first stage, the search for IPUs method estimates the RMS - Root-Mean Square, value of each fragment of the channel. The duration of these fragment windows is fixed by default to 20 ms. The statistical distribution of the obtained RMS values is then analyzed to estimate automatically the most relevant RMS threshold Θ. This automatically fixed value may not be appropriate on some recordings, particularly if they are of low-quality. It can then optionally be fixed manually.

Get silence vs sounding fragment intervals

The RMS of each fragment window is compared to the threshold Θ and the windows below and above the threshold are identified respectively as silence and sounding. The neighboring silent and neighboring sounding windows are grouped into intervals. Because the focus of the search is on the sounding segments, the resulting silent intervals with a too small duration are removed first.

The minimum duration of a sounding segment is fixed to 200 ms by default. Usual values:

200 ms French language;
250 ms for English language.

Construction of the IPUs

It re-groups neighboring sounding intervals that resulted because of the removal of the too short silence. The new resulting sounding intervals with a too small duration are removed. This minimum duration is fixed to 300 ms by default. This value has to be adapted to the recording conditions and the speech style. Usual values are:

300ms for read speech;
100ms for conversational speech.

The algorithm finally re-groups neighboring silent intervals that resulted because of the removal of the too short sounding ones. It then makes the IPUs we searched for. Silent intervals are marked with the symbol ’#’ and IPUs are marked with ’ipus_’ followed by its number.

Optional settings

From our experience of distributing this algorithm in a tool, we received users’ feedback. They allowed us to improve the values to be fixed by default, and it also resulted in adding the following two parameters:

move systematically the boundary of the beginning of all IPUs (default is 20 ms);
move systematically the boundary of the end of all IPUs (default is 20 ms).

Code explanation

ChannelSilences

The ChannelSilences class is designed to detect and manage silences within an audio channel. It estimates the root-mean-square (RMS) values of audio frames over specified time windows to identify silent and sounding intervals. The class provides methods to set and get various parameters, refine silence boundaries, extract tracks, and filter silences based on duration and volume thresholds.

Main functionalities

Detect silences in an audio channel based on rms values.
Filter silences based on minimum duration and volume thresholds.
Refine tracks boundaries for more precise result.
Extract tracks from the audio channel, excluding silences.

Example of use

>>> import audioopy.aio
>>> from audioopy.ipus.channelsilences import ChannelSilences
>>> audio = audioopy.aio.open("tests/samples/oriana1.wav")
>>> channel = audio.extract_channel(0)
>>> # Create the instance: estimate the rms values in windows of 20 ms length
>>> channel_silences = ChannelSilences(channel, win_len=0.02, vagueness=0.005)
>>> # Search for all the silences, comparing each rms to an automatically estimated threshold
>>> threshold = channel_silences.search_silences()
>>> print(list(channel_silences))
[(0, 26880), (27840, 29440), (36480, 37440), (50240, 51200), (52800, 56640), (63680, 65600), (71040, 102080), (111680, 113280), (120640, 121600), (124480, 127040), (128320, 132160), (137280, 138560), (151680, 182720), (193280, 194560), (196480, 197440), (199360, 200320), (204800, 206400), (209280, 214080), (216640, 217600), (218240, 221120), (224640, 225600), (227520, 230080), (231360, 284672.0)]
>>> # Keep only silences during more than a given duration
>>> channel_silences.filter_silences(threshold // 2, 0.250)
>>> print(list(channel_silences))
[(0, 27040), (72000, 102400), (152000, 183040), (231520, 284672.0)]
>>> # Get the (from_pos, to_pos) of the tracks during more than a given duration
>>> tracks = channel_silences.extract_tracks(0.300, 0., 0.)
>>> print(tracks)
[(27040, 72000), (102400, 152000), (183040, 231520)]

SearchForIPUs

The SearchForIPUs class is designed to automatically determinate boundaries of Inter-Pausal Units (IPUs) and silences.

It extends the ChannelSilences class and uses various parameters to identify and refine the boundaries of these segments. The class provides methods to set and get thresholds, window lengths, and other parameters essential for accurate segmentation.

Main functionalities

Set and get easily various parameters like volume threshold, minimum silence duration, and window length.
Find sounding segment boundaries of audio data.
Retrieve audio data of founded IPUs.

Example of use

>>> import audioopy.aio
>>> from audioopy.ipus.searchfor import SearchForIPUs
>>> audio = audioopy.aio.open("tests/samples/oriana1.wav")
>>> channel = audio.extract_channel(0)
>>> # Create a SearchForIPUs instance
>>> search_ipus = SearchForIPUs(channel, win_len=0.02)
>>> # Set parameters
>>> search_ipus.set_vol_threshold(0)
>>> search_ipus.set_min_sil(0.25)
>>> search_ipus.set_min_ipu(0.3)
>>> # Get tracks in the time domain
>>> tracks = search_ipus.get_tracks(time_domain=True)
>>> print(tracks)
[(1.67, 4.52), (6.38, 9.52), (11.42, 14.49)]
>>> # Get frames of the second track
>>> frames = search_ipus.get_track_data([tracks[1]])

References

You are invited to cite one of the following references if you use this program in your research experiments.

They give details about the algorithm and the evaluations:

Brigitte Bigi, Béatrice Priego-Valverde (2019). Search for Inter-Pausal Units: application to Cheese! corpus. 9th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland. pp.289-293. https://hal.science/hal-02428485

Brigitte Bigi, Béatrice Priego-Valverde (2022). The automatic search for sounding segments of SPPAS: application to Cheese! corpus. Human Language Technology. Challenges for Computer Science and Linguistics, LNAI, LNCS 13212, pp. 16-27. https://hal.archives-ouvertes.fr/hal-03697808

AudiooPy 0.5

audioopy.ipus module

The search for IPUs

Definition

Overview of the detection method

The algorithm and its settings in a nutshell

Evaluation of a threshold value

Get silence vs sounding fragment intervals

Construction of the IPUs

Optional settings

Code explanation

ChannelSilences

Main functionalities

Example of use

SearchForIPUs

Main functionalities

Example of use

References

List of classes

ChannelSilences

SearchForIPUs