Bases: ModelConfigBase[ProsodyConfig]
Configuration for the speech prosody model.
Parameters:
Name |
Type |
Description |
Default |
granularity |
Optional[str]
|
The granularity at which to generate predictions.
Accepted values are word , sentence , utterance , or conversational_turn .
The default is utterance .
utterance corresponds to a natural pause or break in conversation
conversational_turn corresponds to a change in speaker.
This configuration is only available for the batch API. |
None
|
identify_speakers |
Optional[bool]
|
Whether to return identifiers for speakers over time. If true,
unique identifiers will be assigned to spoken words to differentiate different speakers. If false,
all speakers will be tagged with an "unknown" ID.
This configuration is only available for the batch API. |
None
|
window |
Optional[Dict[str, float]]
|
Sliding window used to chunk audio.
This dictionary input takes two entries: length and step representing
the width of the window in seconds and the step size in seconds.
This configuration is only available for the batch API. |
None
|
Source code in hume/models/config/prosody_config.py
| @dataclass
class ProsodyConfig(ModelConfigBase["ProsodyConfig"]):
"""Configuration for the speech prosody model.
Args:
granularity (Optional[str]): The granularity at which to generate predictions.
Accepted values are `word`, `sentence`, `utterance`, or `conversational_turn`.
The default is `utterance`.
`utterance` corresponds to a natural pause or break in conversation
`conversational_turn` corresponds to a change in speaker.
This configuration is only available for the batch API.
identify_speakers (Optional[bool]): Whether to return identifiers for speakers over time. If true,
unique identifiers will be assigned to spoken words to differentiate different speakers. If false,
all speakers will be tagged with an "unknown" ID.
This configuration is only available for the batch API.
window (Optional[Dict[str, float]]): Sliding window used to chunk audio.
This dictionary input takes two entries: `length` and `step` representing
the width of the window in seconds and the step size in seconds.
This configuration is only available for the batch API.
"""
identify_speakers: Optional[bool] = None
granularity: Optional[str] = None
window: Optional[Dict[str, float]] = None
@classmethod
def get_model_type(cls) -> ModelType:
"""Get the configuration model type.
Returns:
ModelType: Model type.
"""
return ModelType.PROSODY
|
get_model_type()
classmethod
Get the configuration model type.
Returns:
Name | Type |
Description |
ModelType |
ModelType
|
Model type. |
Source code in hume/models/config/prosody_config.py
| @classmethod
def get_model_type(cls) -> ModelType:
"""Get the configuration model type.
Returns:
ModelType: Model type.
"""
return ModelType.PROSODY
|