Spectrogram Analysis
Spectrogram data access and visualization for audio analysis.
SpectrogramData
The SpectrogramData class provides comprehensive access to Constant-Q Transform (CQT)
spectrograms for computational musicology and audio analysis.
- class idtap.SpectrogramData(data: ndarray, audio_id: str, freq_range: Tuple[float, float] = (75.0, 2400.0), bins_per_octave: int = 72, time_resolution: float | None = None)[source]
Bases:
objectConstant-Q spectrogram data for IDTAP audio recordings.
This class provides access to the same high-quality constant-Q transform spectrograms used in the IDTAP web application, with tools for visualization, manipulation, and integration with matplotlib-based research workflows.
- audio_id
IDTAP audio recording ID
- freq_range
Tuple of (min_hz, max_hz) for the frequency range
- bins_per_octave
Number of frequency bins per octave
- DEFAULT_FREQ_RANGE = (75.0, 2400.0)
- DEFAULT_BINS_PER_OCTAVE = 72
- DEFAULT_TIME_RESOLUTION = 0.01508
- __init__(data: ndarray, audio_id: str, freq_range: Tuple[float, float] = (75.0, 2400.0), bins_per_octave: int = 72, time_resolution: float | None = None)[source]
Initialize SpectrogramData with raw data.
- Parameters:
data – Raw uint8 spectrogram array [freq_bins, time_frames]
audio_id – Audio recording ID
freq_range – Frequency range (min_hz, max_hz)
bins_per_octave – Number of frequency bins per octave
time_resolution – Time resolution in seconds per frame (optional) If None, uses DEFAULT_TIME_RESOLUTION fallback
- classmethod from_audio_id(audio_id: str, client: 'SwaraClient' | None = None) SpectrogramData[source]
Download and load spectrogram data from audio ID.
Fetches compressed spectrogram data from https://swara.studio/spec_data/{audio_id}/ and calculates accurate time_resolution from the audio recording duration in the database.
- Parameters:
audio_id – IDTAP audio recording ID
client – Optional SwaraClient instance (creates one if not provided)
- Returns:
SpectrogramData instance with accurate time_resolution
- Raises:
requests.HTTPError – If spectrogram data doesn’t exist or download fails
- classmethod from_piece(piece: Piece, client: 'SwaraClient' | None = None) 'SpectrogramData' | None[source]
Load spectrogram data from a Piece object.
- Parameters:
piece – Piece object with audio_id attribute
client – Optional SwaraClient instance
- Returns:
SpectrogramData instance, or None if piece has no audio_id
- apply_intensity(power: float = 1.0) ndarray[source]
Apply power-law intensity transformation (matches web app behavior).
This transformation enhances visual contrast in the spectrogram. Formula: output = (input^power / 255^power) * 255
- Parameters:
power – Exponent for power transform (1.0-5.0) 1.0 = linear (no change) >1.0 = increased contrast
- Returns:
Transformed uint8 array with same shape as input
- Raises:
ValueError – If power is outside valid range [1.0, 5.0]
- apply_colormap(data: ndarray | None = None, cmap: str = 'viridis') ndarray[source]
Apply matplotlib colormap to spectrogram data.
- Parameters:
data – Input spectrogram data (if None, uses self._data)
cmap – Matplotlib colormap name (see SUPPORTED_COLORMAPS)
- Returns:
RGB array of shape [height, width, 3] with uint8 values
- Raises:
ValueError – If colormap name is not recognized
- crop_frequency(min_hz: float | None = None, max_hz: float | None = None) SpectrogramData[source]
Crop spectrogram to a specific frequency range.
- Parameters:
min_hz – Minimum frequency (Hz), defaults to original min
max_hz – Maximum frequency (Hz), defaults to original max
- Returns:
New SpectrogramData instance with cropped data
- crop_time(start_time: float | None = None, end_time: float | None = None) SpectrogramData[source]
Crop spectrogram to a specific time range.
- Parameters:
start_time – Start time in seconds (defaults to 0)
end_time – End time in seconds (defaults to duration)
- Returns:
New SpectrogramData instance with cropped data
- get_extent() List[float][source]
Get matplotlib extent for this spectrogram.
- Returns:
[left, right, bottom, top] = [0, duration, min_freq, max_freq] This is the format matplotlib imshow() expects for extent parameter.
- get_plot_data(power: float = 1.0, apply_cmap: bool = False, cmap: str = 'viridis') Tuple[ndarray, List[float]][source]
Get processed spectrogram data and extent for matplotlib plotting.
Use this when you need direct control over the plotting process, or when you want to manipulate the data before plotting.
- Parameters:
power – Intensity power transform (1.0-5.0)
apply_cmap – If True, returns RGB array; if False, returns grayscale uint8
cmap – Colormap name (only used if apply_cmap=True)
- Returns:
- data: Processed spectrogram array
If apply_cmap=False: uint8 array [freq_bins, time_frames] If apply_cmap=True: RGB uint8 array [freq_bins, time_frames, 3]
extent: [left, right, bottom, top] for matplotlib imshow()
- Return type:
Tuple of (data, extent)
Example
>>> # Low-level control >>> data, extent = spec.get_plot_data(power=2.5) >>> fig, ax = plt.subplots() >>> im = ax.imshow(data, extent=extent, aspect='auto', ... origin='lower', cmap='magma')
- plot_on_axis(ax: Axes, power: float = 1.0, cmap: str = 'viridis', alpha: float = 1.0, zorder: int = 0, log_freq: bool = True, **imshow_kwargs) AxesImage[source]
Plot spectrogram on an existing matplotlib axis (for overlays).
This is the primary method for using spectrograms as underlays in custom matplotlib visualizations.
- Parameters:
ax – Matplotlib axis to plot on
power – Intensity power transform (1.0-5.0)
cmap – Matplotlib colormap name
alpha – Transparency (0.0-1.0), useful for subtle underlays
zorder – Drawing order (0 = background, higher = foreground)
log_freq – Whether to use logarithmic frequency scale (default: True)
**imshow_kwargs – Additional arguments passed to ax.imshow()
- Returns:
AxesImage object (useful for adding colorbars)
Example
>>> fig, ax = plt.subplots(figsize=(12, 6)) >>> im = spec.plot_on_axis(ax, power=2.0, cmap='viridis', alpha=0.7) >>> ax.plot(times, pitch_contour, 'r-', linewidth=2) # Overlay pitch >>> ax.set_xlabel('Time (s)') >>> ax.set_ylabel('Frequency (Hz)') >>> plt.colorbar(im, ax=ax, label='Intensity')
- to_image(width: int | None = None, height: int | None = None, power: float = 1.0, cmap: str = 'viridis', interpolation: str = 'bilinear') Image[source]
Generate PIL Image with full processing pipeline.
- Parameters:
width – Output width in pixels (default: original width)
height – Output height in pixels (default: original height)
power – Intensity power transform (1.0-5.0)
cmap – Matplotlib colormap name
interpolation – Resampling method (‘bilinear’, ‘nearest’, ‘lanczos’, etc.) See PIL.Image.Resampling for all options
- Returns:
PIL Image in RGB mode
- to_matplotlib(figsize: Tuple[float, float] = (12, 6), power: float = 1.0, cmap: str = 'viridis', show_colorbar: bool = True, show_axes: bool = True, log_freq: bool = True) Figure[source]
Generate standalone matplotlib Figure for publication.
Use this for quick visualization. For overlays and custom plots, use plot_on_axis() instead.
- Parameters:
figsize – Figure size (width, height) in inches
power – Intensity power transform (1.0-5.0)
cmap – Matplotlib colormap name
show_colorbar – Whether to show colorbar
show_axes – Whether to show frequency/time axis labels
log_freq – Whether to use logarithmic frequency scale (default: True)
- Returns:
Matplotlib Figure object
- save(filepath: str, width: int | None = None, height: int | None = None, power: float = 1.0, cmap: str = 'viridis', format: str | None = None, **kwargs)[source]
Save spectrogram as image file.
- Parameters:
filepath – Output file path
width – Output width in pixels (default: original)
height – Output height in pixels (default: original)
power – Intensity power transform (1.0-5.0)
cmap – Matplotlib colormap name
format – Image format (‘png’, ‘jpg’, ‘webp’, etc.) Auto-detected from filepath extension if not provided
**kwargs – Additional arguments passed to PIL Image.save()
- property time_resolution: float
Time resolution in seconds per frame.
Calculated from audio recording duration in database (when available). Falls back to DEFAULT_TIME_RESOLUTION if recording metadata unavailable.
Note: Spectrograms always cover the full audio recording, even when the associated Piece transcribes only an excerpt.
- property freq_bins: ndarray
Array of frequency values (Hz) for each bin.
Calculated from bins_per_octave and freq_range using log spacing.
Key Features
Constant-Q Transform (CQT) - Log-spaced frequency bins for musical analysis
Intensity Transformation - Power-law contrast enhancement (1.0-5.0)
Colormap Support - 35+ matplotlib colormaps
Frequency/Time Cropping - Extract specific frequency ranges or time segments
Matplotlib Integration - Plot on existing axes for overlays with pitch contours
Image Export - Save as PNG, JPEG, WebP, etc.
Quick Examples
Load and display a spectrogram:
from idtap import SwaraClient, SpectrogramData
client = SwaraClient()
spec = SpectrogramData.from_audio_id("audio_id_here", client)
# Save basic visualization
spec.save("output.png", power=2.0, cmap='viridis')
Create matplotlib overlay with pitch contour:
import matplotlib.pyplot as plt
# Load spectrogram and piece data
spec = SpectrogramData.from_piece(piece, client)
# Create figure
fig, ax = plt.subplots(figsize=(12, 6))
# Plot spectrogram as underlay with transparency
im = spec.plot_on_axis(ax, power=2.0, cmap='viridis', alpha=0.7, zorder=0)
# Overlay pitch contour
times = [traj.start_time for traj in piece.trajectories]
pitches = [traj.pitch_contour[0] for traj in piece.trajectories]
ax.plot(times, pitches, 'r-', linewidth=2, zorder=1)
# Configure axes
ax.set_xlabel('Time (s)')
ax.set_ylabel('Frequency (Hz)')
plt.colorbar(im, ax=ax, label='Intensity')
plt.savefig('overlay.png', dpi=150, bbox_inches='tight')
Crop to specific region:
# Extract 200-800 Hz range, first 10 seconds
cropped = spec.crop_frequency(200, 800).crop_time(0, 10)
cropped.save("cropped.png", power=2.5, cmap='magma')
Supported Colormaps
- idtap.SUPPORTED_COLORMAPS
Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.
Available colormaps include: viridis, plasma, magma, inferno, hot, cool, gray, and many more. See the matplotlib colormap documentation for visual examples.
Technical Details
Algorithm: Essentia NSGConstantQ (Non-Stationary Gabor Constant-Q Transform)
Default Frequency Range: 75-2400 Hz
Default Bins Per Octave: 72 (high resolution for microtonal analysis)
Data Format: uint8 grayscale (0-255), gzip-compressed
Time Resolution: ~0.0116 seconds per frame (typical)
Frequency Scale: Logarithmic (perceptually-uniform for music)