These questions are related to how Wiretap handles audio media.
Audio and video media are quite different from an I/O standpoint. Audio samples are extremely small. One second of uncompressed audio at 44kHz requires no more than 172KB, while an uncompressed video media requires 30MB for each second of 8-bit NTSC.
Wiretap is a client-server architecture operating across a network. You must be aware of the size of the I/O being performed so that the network is not clogged with small audio media requests. For this, the Wiretap frame API is currently used to read blocks of audio samples. Each frame of audio represents a block of samples, the size of which is decided by the Wiretap server using the number of frames and the WireTapClipFormat object of the audio clip node in question. For more information, see Memory Required per Frame.