Skip to content

Commit 4499f28

Browse files
authored
3.0.7 (#803)
1 parent 33b3127 commit 4499f28

File tree

23 files changed

+246
-72
lines changed

23 files changed

+246
-72
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ conda env create -n mfa-dev -f environment.yml
3737
Alternatively, the dependencies can be installed via:
3838

3939
```
40-
conda install -c conda-forge python=3.8 kaldi sox librosa biopython praatio tqdm requests colorama pyyaml pynini openfst baumwelch ngram
40+
conda install -c conda-forge python=3.11 kaldi librosa biopython praatio tqdm requests colorama pyyaml pynini openfst baumwelch ngram
4141
```
4242

4343
MFA can be installed in develop mode via:

ci/docker_environment.yaml

-2
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ dependencies:
1212
- pyyaml
1313
- dataclassy
1414
- kaldi=*=*cpu*
15-
- sox
16-
- ffmpeg
1715
- pynini
1816
- openfst=1.8.3
1917
- scikit-learn<1.3

docs/source/changelog/changelog_3.0.rst

+10-1
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,22 @@
55
3.0 Changelog
66
*************
77

8+
3.0.7
9+
-----
10+
11+
- Add check for current version vs latest version on run
12+
- Added :code:`--final_clean` flag to clean temporary files at the end of each run, along with a :code:`--always_final_clean` flag for :code:`mfa configure`
13+
- Removed dependencies on :code:`sox` and :code:`ffmpeg` as audio loading is done through :code:`librosa` in :code:`kalpy`
14+
- Removed poorly aligned files in subset from further training
15+
- Fixed an issue where specified words for cutoff modeling
16+
817
3.0.6
918
-----
1019

1120
- Fixed an issue where alignment analysis would not produce data for speech log likelihood and phone duration deviation
1221
- Changed phone duration deviation metric to be maximum duration deviation rather than average across all phones in the utterance
1322
- Fixed a crash when an empty phone set was specified in phone groups configuration files
14-
- Fixed a crash when when using the :code:`--language` flag with values other than :code`japanese`, :code`thai`, :code`chinese` or :code`korean`
23+
- Fixed a crash when when using the :code:`--language` flag with values other than :code:`japanese`, :code:`thai`, :code:`chinese` or :code:`korean`
1524

1625
3.0.5
1726
=====

docs/source/user_guide/dictionary.rst

+15-2
Original file line numberDiff line numberDiff line change
@@ -191,17 +191,30 @@ Modeling cutoffs and hesitations
191191

192192
Often in spontaneous speech, speakers will produce truncated or cut-off words of the following word/words. To help model this specific case, using the flag :code:`--use_cutoff_model` will enable a mode where pronunciations are generated for cutoff words matching one of the following criteria:
193193

194-
1. The cutoff word matches the pattern of :code:`{start_bracket}(cutoff|hes)`, where :code:`{start_bracket}` is the set of all left side brackets defined in :code:`brackets` (:ref:`configuration_dictionary`). The following word must not be an OOV or non-speech word (silence, laughter, another cutoff, etc).
194+
1. The cutoff word matches the pattern of :code:`{start_bracket}(cutoff|hes)`, where :code:`{start_bracket}` is the set of all left side brackets defined in :code:`brackets` (:ref:`configuration_dictionary`). Optionally, you can specify the intended word via a hyphen within the brackets (i.e., :code:`<cutoff-cut>`). If a target word isn't specified, then the immediately following word will be used if it's not an OOV or non-speech word (silence, laughter, another cutoff, etc).
195195
2. The cutoff word matches the pattern of :code:`{start_bracket}(cutoff|hes)[-_](word){end_bracket}`, where start and end brackets are defined in :code:`brackets` (:ref:`configuration_dictionary`). The :code:`word` will be used in place of the following word above, but needs to be present in the dictionary, otherwise the target word for the cutoff will default back to the following word.
196196

197-
The generated pronunciations will be subsequences of the following word, along with an :code:`spn` pronunciation. For example, given an utterance transcript like "<cutoff> cut off" will have the following pronunciations generated for the `English (US) MFA dictionary <https://mfa-models.readthedocs.io/en/latest/dictionary/English/English%20%28US%29%20MFA%20dictionary%20v3_0_0.html>`_:
197+
The generated pronunciations will be subsequences of the following word, along with an :code:`spn` pronunciation. For example, consider an utterance transcript like
198198

199199
::
200200

201+
<cutoff-off> with the <cutoff> <cutoff> cut off
202+
203+
204+
The following pronunciations will be generated for the `English (US) MFA dictionary <https://mfa-models.readthedocs.io/en/latest/dictionary/English/English%20%28US%29%20MFA%20dictionary%20v3_0_0.html>`_:
205+
206+
::
207+
208+
<cutoff> spn
201209
<cutoff-cut> spn
202210
<cutoff-cut> kʰ ɐ t
203211
<cutoff-cut> kʰ ɐ
204212
<cutoff-cut> kʰ
213+
<cutoff-off> spn
214+
<cutoff-off> ɒ f
215+
<cutoff-off> ɒ
216+
<cutoff-off> ɑ f
217+
<cutoff-off> ɑ
205218

206219

207220
.. _speaker_dictionaries:

environment.yml

-2
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,6 @@ dependencies:
1313
- pyyaml
1414
- dataclassy
1515
- kaldi=*=*cpu*
16-
- sox
17-
- ffmpeg
1816
- scipy
1917
- pynini
2018
- openfst=1.8.3

montreal_forced_aligner/abc.py

+44
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
import contextlib
1010
import logging
1111
import os
12+
import re
1213
import shutil
1314
import subprocess
1415
import sys
@@ -28,6 +29,7 @@
2829
get_type_hints,
2930
)
3031

32+
import requests
3133
import sqlalchemy
3234
import yaml
3335
from sqlalchemy.orm import scoped_session, sessionmaker
@@ -694,6 +696,30 @@ def cleanup(self) -> None:
694696
logger.error("There was an error in the run, please see the log.")
695697
else:
696698
logger.info(f"Done! Everything took {time.time() - self.start_time:.3f} seconds")
699+
if config.FINAL_CLEAN:
700+
logger.debug(
701+
"Cleaning up temporary files, use the --debug flag to keep temporary files."
702+
)
703+
if hasattr(self, "delete_database"):
704+
if config.USE_POSTGRES:
705+
proc = subprocess.run(
706+
[
707+
"dropdb",
708+
f"--host={config.database_socket()}",
709+
"--if-exists",
710+
"--force",
711+
self.identifier,
712+
],
713+
stderr=subprocess.PIPE,
714+
stdout=subprocess.PIPE,
715+
check=True,
716+
encoding="utf-8",
717+
)
718+
logger.debug(f"Stdout: {proc.stdout}")
719+
logger.debug(f"Stderr: {proc.stderr}")
720+
else:
721+
self.delete_database()
722+
self.clean_working_directory()
697723
self.save_worker_config()
698724
self.cleanup_logger()
699725
except (NameError, ValueError): # already cleaned up
@@ -780,6 +806,24 @@ def setup_logger(self) -> None:
780806
os.makedirs(self.output_directory, exist_ok=True)
781807
configure_logger("mfa", log_file=self.log_file)
782808
logger = logging.getLogger("mfa")
809+
if config.VERBOSE:
810+
try:
811+
response = requests.get(
812+
"https://api.github.com/repos/MontrealCorpusTools/Montreal-Forced-Aligner/releases/latest"
813+
)
814+
latest_version = response.json()["tag_name"].replace("v", "")
815+
if current_version < latest_version:
816+
logger.debug(
817+
f"You are currently running an older version of MFA ({current_version}) than the latest available ({latest_version}). "
818+
f"To update, please run mfa_update."
819+
)
820+
except KeyError:
821+
pass
822+
if re.search(r"\d+\.\d+\.\d+a", current_version) is not None:
823+
logger.debug(
824+
"Please be aware that you are running an alpha version of MFA. If you would like to install a more "
825+
"stable version, please visit https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html#installing-older-versions-of-mfa",
826+
)
783827
logger.debug(f"Beginning run for {self.data_source_identifier}")
784828
logger.debug(f'Using "{config.CURRENT_PROFILE_NAME}" profile')
785829
if config.USE_MP:

montreal_forced_aligner/acoustic_modeling/trainer.py

+72-13
Original file line numberDiff line numberDiff line change
@@ -500,6 +500,57 @@ def export_model(self, output_model_path: Path) -> None:
500500
self.training_configs[self.final_identifier].export_model(output_model_path)
501501
logger.info(f"Saved model to {output_model_path}")
502502

503+
def quality_check_subset(self):
504+
from _kalpy.util import Int32VectorWriter
505+
from kalpy.gmm.data import AlignmentArchive
506+
from kalpy.utils import generate_write_specifier
507+
508+
with self.session() as session:
509+
utterance_ids = set(
510+
x[0]
511+
for x in session.query(Utterance.id)
512+
.filter(Utterance.in_subset == True, Utterance.duration_deviation > 10) # noqa
513+
.all()
514+
)
515+
logger.debug(
516+
f"Removing {len(utterance_ids)} utterances from subset due to large duration deviations"
517+
)
518+
bulk_update(session, Utterance, [{"id": x, "in_subset": False} for x in utterance_ids])
519+
session.commit()
520+
for j in self.jobs:
521+
ali_paths = j.construct_path_dictionary(self.working_directory, "ali", "ark")
522+
temp_ali_paths = j.construct_path_dictionary(
523+
self.working_directory, "temp_ali", "ark"
524+
)
525+
for dict_id, ali_path in ali_paths.items():
526+
new_path = temp_ali_paths[dict_id]
527+
write_specifier = generate_write_specifier(new_path)
528+
writer = Int32VectorWriter(write_specifier)
529+
alignment_archive = AlignmentArchive(ali_path)
530+
531+
for alignment in alignment_archive:
532+
if alignment.utterance_id in utterance_ids:
533+
continue
534+
writer.Write(str(alignment.utterance_id), alignment.alignment)
535+
del alignment_archive
536+
writer.Close()
537+
ali_path.unlink()
538+
new_path.rename(ali_path)
539+
feat_path = j.construct_path(
540+
j.corpus.current_subset_directory, "feats", "scp", dictionary_id=dict_id
541+
)
542+
feat_lines = []
543+
with mfa_open(feat_path, "r") as feat_file:
544+
for line in feat_file:
545+
utterance_id = line.split(maxsplit=1)[0]
546+
if utterance_id in utterance_ids:
547+
continue
548+
feat_lines.append(line)
549+
550+
with mfa_open(feat_path, "w") as feat_file:
551+
for line in feat_lines:
552+
feat_file.write(line)
553+
503554
def train(self) -> None:
504555
"""
505556
Run through the training configurations to produce a final acoustic model
@@ -527,20 +578,21 @@ def train(self) -> None:
527578
previous.exported_model_path, self.working_directory
528579
)
529580
self.align()
530-
if config.DEBUG:
531-
with self.session() as session:
532-
session.query(WordInterval).delete()
533-
session.query(PhoneInterval).delete()
534-
session.commit()
535-
self.collect_alignments()
581+
with self.session() as session:
582+
session.query(WordInterval).delete()
583+
session.query(PhoneInterval).delete()
584+
session.commit()
585+
self.collect_alignments()
586+
self.analyze_alignments()
587+
if self.current_subset != 0:
588+
self.quality_check_subset()
536589

537590
self.set_current_workflow(trainer.identifier)
538591
if trainer.identifier.startswith("pronunciation_probabilities"):
539-
if config.DEBUG:
540-
with self.session() as session:
541-
session.query(WordInterval).delete()
542-
session.query(PhoneInterval).delete()
543-
session.commit()
592+
with self.session() as session:
593+
session.query(WordInterval).delete()
594+
session.query(PhoneInterval).delete()
595+
session.commit()
544596
trainer.train_pronunciation_probabilities()
545597
else:
546598
trainer.train()
@@ -623,6 +675,10 @@ def compute_phone_pdf_counts(self) -> None:
623675
logger.info("Finished accumulating transition stats!")
624676

625677
def finalize_training(self):
678+
with self.session() as session:
679+
session.query(WordInterval).delete()
680+
session.query(PhoneInterval).delete()
681+
session.commit()
626682
self.compute_phone_pdf_counts()
627683
self.collect_alignments()
628684
self.analyze_alignments()
@@ -662,8 +718,11 @@ def num_current_utterances(self) -> int:
662718
def align_options(self) -> MetaDict:
663719
"""Alignment options"""
664720
if self.current_aligner is not None:
665-
return self.current_aligner.align_options
666-
return super().align_options
721+
options = self.current_aligner.align_options
722+
else:
723+
options = super().align_options
724+
options["boost_silence"] = max(1.25, options["boost_silence"])
725+
return options
667726

668727
def align(self) -> None:
669728
"""

montreal_forced_aligner/acoustic_modeling/triphone.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ def __init__(self, args: TreeStatsArguments):
142142
self.working_directory = args.working_directory
143143
self.model_path = args.model_path
144144

145-
def _run(self) -> typing.Generator[typing.Tuple[int, int]]:
145+
def _run(self):
146146
"""Run the function"""
147147
with self.session() as session, thread_logger(
148148
"kalpy.train", self.log_path, job_name=self.job_name
@@ -166,6 +166,7 @@ def _run(self) -> typing.Generator[typing.Tuple[int, int]]:
166166
feature_archive = job.construct_feature_archive(self.working_directory, dict_id)
167167
ali_path = job.construct_path(self.working_directory, "ali", "ark", dict_id)
168168
train_logger.debug("Feature Archive information:")
169+
train_logger.debug(f"File: {feature_archive.file_name}")
169170
train_logger.debug(f"CMVN: {feature_archive.cmvn_read_specifier}")
170171
train_logger.debug(f"Deltas: {feature_archive.use_deltas}")
171172
train_logger.debug(f"Splices: {feature_archive.use_splices}")

montreal_forced_aligner/command_line/anchor.py

+18
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,11 @@
44
import logging
55
import sys
66

7+
import requests
78
import rich_click as click
89

10+
from montreal_forced_aligner import config
11+
912
__all__ = ["anchor_cli"]
1013

1114
logger = logging.getLogger("mfa")
@@ -24,4 +27,19 @@ def anchor_cli(*args, **kwargs) -> None: # pragma: no cover
2427
"Anchor annotator utility is not installed, please install it via `conda install -c conda-forge anchor-annotator`."
2528
)
2629
sys.exit(1)
30+
if config.VERBOSE:
31+
try:
32+
from anchor._version import version
33+
34+
response = requests.get(
35+
"https://api.github.com/repos/MontrealCorpusTools/Anchor-annotator/releases/latest"
36+
)
37+
latest_version = response.json()["tag_name"].replace("v", "")
38+
if version < latest_version:
39+
click.echo(
40+
f"You are currently running an older version of Anchor annotator ({version}) than the latest available ({latest_version}). "
41+
f"To update, please run mfa_update."
42+
)
43+
except ImportError:
44+
pass
2745
main()

montreal_forced_aligner/command_line/configure.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,17 @@
3737
@click.option(
3838
"--always_clean/--never_clean",
3939
"clean",
40-
help="Turn on/off clean mode where MFA will clean temporary files before each run. "
40+
help="Turn on/off mode where MFA will clean temporary files before each run. "
4141
f"Currently defaults to {config.CLEAN}.",
4242
default=None,
4343
)
44+
@click.option(
45+
"--always_final_clean/--never_final_clean",
46+
"final_clean",
47+
help="Turn on/off mode where MFA will clean temporary files at the end of each run. "
48+
f"Currently defaults to {config.FINAL_CLEAN}.",
49+
default=None,
50+
)
4451
@click.option(
4552
"--always_verbose/--never_verbose",
4653
"verbose",

montreal_forced_aligner/command_line/mfa.py

+1-13
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33

44
import atexit
55
import logging
6-
import re
76
import sys
87
import time
98
import warnings
@@ -118,17 +117,6 @@ def mfa_cli(ctx: click.Context) -> None:
118117
"""
119118
from montreal_forced_aligner.command_line.utils import check_server, start_server, stop_server
120119

121-
try:
122-
from montreal_forced_aligner._version import version
123-
124-
if re.search(r"\d+\.\d+\.\d+a", version) is not None:
125-
print(
126-
"Please be aware that you are running an alpha version of MFA. If you would like to install a more "
127-
"stable version, please visit https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html#installing-older-versions-of-mfa",
128-
file=sys.stderr,
129-
)
130-
except ImportError:
131-
pass
132120
config.load_configuration()
133121
auto_server = False
134122
run_check = True
@@ -182,7 +170,7 @@ def version_cli():
182170
from montreal_forced_aligner._version import version
183171
except ImportError:
184172
version = None
185-
print(version)
173+
click.echo(version)
186174

187175

188176
mfa_cli.add_command(adapt_model_cli)

montreal_forced_aligner/command_line/utils.py

+6
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,12 @@ def common_options(f: typing.Callable) -> typing.Callable:
7373
help=f"Remove files from previous runs, default is {config.CLEAN}",
7474
default=None,
7575
),
76+
click.option(
77+
"--final_clean/--no_final_clean",
78+
"final_clean",
79+
help=f"Remove temporary files at the end of run, default is {config.FINAL_CLEAN}",
80+
default=None,
81+
),
7682
click.option(
7783
"--verbose/--no_verbose",
7884
"-v/-nv",

montreal_forced_aligner/config.py

+1
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,7 @@ def update_command_history(command_data: Dict[str, Any]) -> None:
136136

137137

138138
CLEAN = False
139+
FINAL_CLEAN = False
139140
VERBOSE = False
140141
DEBUG = False
141142
QUIET = False

0 commit comments

Comments
 (0)