Spectrogram Analysis

Spectrogram data access and visualization for audio analysis.

SpectrogramData

The SpectrogramData class provides comprehensive access to Constant-Q Transform (CQT) spectrograms for computational musicology and audio analysis.

class idtap.SpectrogramData(data: ndarray, audio_id: str, freq_range: Tuple[float, float] = (75.0, 2400.0), bins_per_octave: int = 72, time_resolution: float | None = None)[source]

Bases: object

Constant-Q spectrogram data for IDTAP audio recordings.

This class provides access to the same high-quality constant-Q transform spectrograms used in the IDTAP web application, with tools for visualization, manipulation, and integration with matplotlib-based research workflows.

audio_id

IDTAP audio recording ID

freq_range

Tuple of (min_hz, max_hz) for the frequency range

bins_per_octave

Number of frequency bins per octave

DEFAULT_FREQ_RANGE = (75.0, 2400.0)
DEFAULT_BINS_PER_OCTAVE = 72
DEFAULT_TIME_RESOLUTION = 0.01508
__init__(data: ndarray, audio_id: str, freq_range: Tuple[float, float] = (75.0, 2400.0), bins_per_octave: int = 72, time_resolution: float | None = None)[source]

Initialize SpectrogramData with raw data.

Parameters:
  • data – Raw uint8 spectrogram array [freq_bins, time_frames]

  • audio_id – Audio recording ID

  • freq_range – Frequency range (min_hz, max_hz)

  • bins_per_octave – Number of frequency bins per octave

  • time_resolution – Time resolution in seconds per frame (optional) If None, uses DEFAULT_TIME_RESOLUTION fallback

classmethod from_audio_id(audio_id: str, client: 'SwaraClient' | None = None) SpectrogramData[source]

Download and load spectrogram data from audio ID.

Fetches compressed spectrogram data from https://swara.studio/spec_data/{audio_id}/ and calculates accurate time_resolution from the audio recording duration in the database.

Parameters:
  • audio_id – IDTAP audio recording ID

  • client – Optional SwaraClient instance (creates one if not provided)

Returns:

SpectrogramData instance with accurate time_resolution

Raises:

requests.HTTPError – If spectrogram data doesn’t exist or download fails

classmethod from_piece(piece: Piece, client: 'SwaraClient' | None = None) 'SpectrogramData' | None[source]

Load spectrogram data from a Piece object.

Parameters:
  • piece – Piece object with audio_id attribute

  • client – Optional SwaraClient instance

Returns:

SpectrogramData instance, or None if piece has no audio_id

apply_intensity(power: float = 1.0) ndarray[source]

Apply power-law intensity transformation (matches web app behavior).

This transformation enhances visual contrast in the spectrogram. Formula: output = (input^power / 255^power) * 255

Parameters:

power – Exponent for power transform (1.0-5.0) 1.0 = linear (no change) >1.0 = increased contrast

Returns:

Transformed uint8 array with same shape as input

Raises:

ValueError – If power is outside valid range [1.0, 5.0]

apply_colormap(data: ndarray | None = None, cmap: str = 'viridis') ndarray[source]

Apply matplotlib colormap to spectrogram data.

Parameters:
  • data – Input spectrogram data (if None, uses self._data)

  • cmap – Matplotlib colormap name (see SUPPORTED_COLORMAPS)

Returns:

RGB array of shape [height, width, 3] with uint8 values

Raises:

ValueError – If colormap name is not recognized

crop_frequency(min_hz: float | None = None, max_hz: float | None = None) SpectrogramData[source]

Crop spectrogram to a specific frequency range.

Parameters:
  • min_hz – Minimum frequency (Hz), defaults to original min

  • max_hz – Maximum frequency (Hz), defaults to original max

Returns:

New SpectrogramData instance with cropped data

crop_time(start_time: float | None = None, end_time: float | None = None) SpectrogramData[source]

Crop spectrogram to a specific time range.

Parameters:
  • start_time – Start time in seconds (defaults to 0)

  • end_time – End time in seconds (defaults to duration)

Returns:

New SpectrogramData instance with cropped data

get_extent() List[float][source]

Get matplotlib extent for this spectrogram.

Returns:

[left, right, bottom, top] = [0, duration, min_freq, max_freq] This is the format matplotlib imshow() expects for extent parameter.

get_plot_data(power: float = 1.0, apply_cmap: bool = False, cmap: str = 'viridis') Tuple[ndarray, List[float]][source]

Get processed spectrogram data and extent for matplotlib plotting.

Use this when you need direct control over the plotting process, or when you want to manipulate the data before plotting.

Parameters:
  • power – Intensity power transform (1.0-5.0)

  • apply_cmap – If True, returns RGB array; if False, returns grayscale uint8

  • cmap – Colormap name (only used if apply_cmap=True)

Returns:

  • data: Processed spectrogram array

    If apply_cmap=False: uint8 array [freq_bins, time_frames] If apply_cmap=True: RGB uint8 array [freq_bins, time_frames, 3]

  • extent: [left, right, bottom, top] for matplotlib imshow()

Return type:

Tuple of (data, extent)

Example

>>> # Low-level control
>>> data, extent = spec.get_plot_data(power=2.5)
>>> fig, ax = plt.subplots()
>>> im = ax.imshow(data, extent=extent, aspect='auto',
...                origin='lower', cmap='magma')
plot_on_axis(ax: Axes, power: float = 1.0, cmap: str = 'viridis', alpha: float = 1.0, zorder: int = 0, log_freq: bool = True, **imshow_kwargs) AxesImage[source]

Plot spectrogram on an existing matplotlib axis (for overlays).

This is the primary method for using spectrograms as underlays in custom matplotlib visualizations.

Parameters:
  • ax – Matplotlib axis to plot on

  • power – Intensity power transform (1.0-5.0)

  • cmap – Matplotlib colormap name

  • alpha – Transparency (0.0-1.0), useful for subtle underlays

  • zorder – Drawing order (0 = background, higher = foreground)

  • log_freq – Whether to use logarithmic frequency scale (default: True)

  • **imshow_kwargs – Additional arguments passed to ax.imshow()

Returns:

AxesImage object (useful for adding colorbars)

Example

>>> fig, ax = plt.subplots(figsize=(12, 6))
>>> im = spec.plot_on_axis(ax, power=2.0, cmap='viridis', alpha=0.7)
>>> ax.plot(times, pitch_contour, 'r-', linewidth=2)  # Overlay pitch
>>> ax.set_xlabel('Time (s)')
>>> ax.set_ylabel('Frequency (Hz)')
>>> plt.colorbar(im, ax=ax, label='Intensity')
to_image(width: int | None = None, height: int | None = None, power: float = 1.0, cmap: str = 'viridis', interpolation: str = 'bilinear') Image[source]

Generate PIL Image with full processing pipeline.

Parameters:
  • width – Output width in pixels (default: original width)

  • height – Output height in pixels (default: original height)

  • power – Intensity power transform (1.0-5.0)

  • cmap – Matplotlib colormap name

  • interpolation – Resampling method (‘bilinear’, ‘nearest’, ‘lanczos’, etc.) See PIL.Image.Resampling for all options

Returns:

PIL Image in RGB mode

to_matplotlib(figsize: Tuple[float, float] = (12, 6), power: float = 1.0, cmap: str = 'viridis', show_colorbar: bool = True, show_axes: bool = True, log_freq: bool = True) Figure[source]

Generate standalone matplotlib Figure for publication.

Use this for quick visualization. For overlays and custom plots, use plot_on_axis() instead.

Parameters:
  • figsize – Figure size (width, height) in inches

  • power – Intensity power transform (1.0-5.0)

  • cmap – Matplotlib colormap name

  • show_colorbar – Whether to show colorbar

  • show_axes – Whether to show frequency/time axis labels

  • log_freq – Whether to use logarithmic frequency scale (default: True)

Returns:

Matplotlib Figure object

save(filepath: str, width: int | None = None, height: int | None = None, power: float = 1.0, cmap: str = 'viridis', format: str | None = None, **kwargs)[source]

Save spectrogram as image file.

Parameters:
  • filepath – Output file path

  • width – Output width in pixels (default: original)

  • height – Output height in pixels (default: original)

  • power – Intensity power transform (1.0-5.0)

  • cmap – Matplotlib colormap name

  • format – Image format (‘png’, ‘jpg’, ‘webp’, etc.) Auto-detected from filepath extension if not provided

  • **kwargs – Additional arguments passed to PIL Image.save()

property shape: Tuple[int, int]

(frequency_bins, time_frames).

Type:

Data shape

property duration: float

Audio duration in seconds (estimated from time frames).

property time_resolution: float

Time resolution in seconds per frame.

Calculated from audio recording duration in database (when available). Falls back to DEFAULT_TIME_RESOLUTION if recording metadata unavailable.

Note: Spectrograms always cover the full audio recording, even when the associated Piece transcribes only an excerpt.

property freq_bins: ndarray

Array of frequency values (Hz) for each bin.

Calculated from bins_per_octave and freq_range using log spacing.

Key Features

  • Constant-Q Transform (CQT) - Log-spaced frequency bins for musical analysis

  • Intensity Transformation - Power-law contrast enhancement (1.0-5.0)

  • Colormap Support - 35+ matplotlib colormaps

  • Frequency/Time Cropping - Extract specific frequency ranges or time segments

  • Matplotlib Integration - Plot on existing axes for overlays with pitch contours

  • Image Export - Save as PNG, JPEG, WebP, etc.

Quick Examples

Load and display a spectrogram:

from idtap import SwaraClient, SpectrogramData

client = SwaraClient()
spec = SpectrogramData.from_audio_id("audio_id_here", client)

# Save basic visualization
spec.save("output.png", power=2.0, cmap='viridis')

Create matplotlib overlay with pitch contour:

import matplotlib.pyplot as plt

# Load spectrogram and piece data
spec = SpectrogramData.from_piece(piece, client)

# Create figure
fig, ax = plt.subplots(figsize=(12, 6))

# Plot spectrogram as underlay with transparency
im = spec.plot_on_axis(ax, power=2.0, cmap='viridis', alpha=0.7, zorder=0)

# Overlay pitch contour
times = [traj.start_time for traj in piece.trajectories]
pitches = [traj.pitch_contour[0] for traj in piece.trajectories]
ax.plot(times, pitches, 'r-', linewidth=2, zorder=1)

# Configure axes
ax.set_xlabel('Time (s)')
ax.set_ylabel('Frequency (Hz)')
plt.colorbar(im, ax=ax, label='Intensity')

plt.savefig('overlay.png', dpi=150, bbox_inches='tight')

Crop to specific region:

# Extract 200-800 Hz range, first 10 seconds
cropped = spec.crop_frequency(200, 800).crop_time(0, 10)
cropped.save("cropped.png", power=2.5, cmap='magma')

Supported Colormaps

idtap.SUPPORTED_COLORMAPS

Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list. The argument must be an iterable if specified.

Available colormaps include: viridis, plasma, magma, inferno, hot, cool, gray, and many more. See the matplotlib colormap documentation for visual examples.

Technical Details

  • Algorithm: Essentia NSGConstantQ (Non-Stationary Gabor Constant-Q Transform)

  • Default Frequency Range: 75-2400 Hz

  • Default Bins Per Octave: 72 (high resolution for microtonal analysis)

  • Data Format: uint8 grayscale (0-255), gzip-compressed

  • Time Resolution: ~0.0116 seconds per frame (typical)

  • Frequency Scale: Logarithmic (perceptually-uniform for music)