Re: FT2232H asynchronous maximum data rates?

From:	Caleb Kemere <ckemere@xxxxxxxxx>
To:	libftdi@xxxxxxxxxxxxxxxxxxxxxxx
Date:	Thu, 23 Jun 2011 15:51:15 -0700
After getting some personalized consulting from a friend (shoutout for
his company: www.totalphase.com), and delving into the libusb-1.0
code, I think I understand pretty clearly what is required to get
maximum data rates from the FT2232H. For a situation like mine, where
I want streaming bulk transfer at 6.5 MB/s without interruption for
several hours, the main issue which prevents maximum throughput is the
inefficiencies that are inherent in generating data requests ("URB"s)
to the kernel USB driver.

1. Let's start with the simplest program that I started out with:
(a) initialize ftdi data
(b) loop forever around ftdi_read_data

In this case, each time I call ftdi_read_data, both libftdi and libusb
will create and populate a bunch of structures which basically assist
in generating a list of read-requests for the data transfer ("URB"s)
which get sent to the kernel and from there to the host controller. In
linux, the maximum request is 16 kB. Let's say I ask for 2MB of data -
that means that a list of 64 URBs is generated, and then sent to the
driver all at once. When all the data has been received, the function
returns. Since in my program, it's wrapped inside an infinite loop,
the same list of URBs is then regenerated and sent out to the
hardware. The problem that I was running into was that occasionally,
this list generation took too long, and the FT2232H's on chip buffer
would fill up (resulting in an unacceptable loss of data).

2. The next step was to see if I could bypass libftdi in generating
the bulk transfer request, and just using libusb's
"libusb_bulk_transfer" command. The structure of the code is still the
same - the infinite loop just wraps libusb_bulk_transfer now (which is
populated using the data pulled from the ftdi structure). This
improved the process - now only libusb has to allocate a list of
requests. However, while the rate of buffer overflow's dropped by
about a factor of 10, they still happened. Furthermore, to get this
performance, I had to ask for 2MB at a time, but since I'm actually
acquiring data at 25 kHz, that meant that my effective latency was
almost half a second.

3. I next followed up on the suggestion to use the asynchronous
transfers in libusb. So I generated a transfer structure and populated
it. The basic idea behind the asynchronous mode is that a call-back
function is called when a transfer has been satisfied. So I made the
simplest callback I could - I basically just re-requested the transfer
in the callback. Unfortunately, what I didn't understand was that this
has essentially all the problems the synchronous function does, and
actually is _less_ efficient (for reasons I'll now describe).

4. I finally have something working. Here's the key - the goal is to
keep the list of URB's at the kernel constantly filled. Anytime that
list empties, there's a chance it won't be refilled in time to request
data and the FT2232H buffer will fill up. So the _right_ way to do the
asynchronous mode is to make a list of transfers. Each time one of
them returns, another is requested. There's overhead involved in
converting this request into the corresponding list of URBs, but as
long as the list starts out big enough, we can amortize this overhead
over the time the other transfers take to complete. So, on my desktop
machine, a list of 32 2K transfers will never (that I've seen) result
in a buffer overflow. On my less efficient laptop, I have to boost to
128 2K transfers. The cool thing here is that my latency is whatever
is inherent in a 2K size (which ends up being about 10 samples). Note
that the minimum latency will come from the 512B fundamental USB high
speed transfer buffer size.

5. So the downside with this solution is that the amount of CPU
overhead goes up with the number of transfers requested (b/c of the
inefficiency of producing the list of URBs for the kernel. So in the
future, I'll use the info in the libftdi _and_ libusb structures to
just make my own list of URBs. In this case, memory only needs to be
allocated once...

thanks for all your help, and I hope this is helpful to someone else!

– caleb



On Mon, Jun 20, 2011 at 11:39 PM, Xiaofan Chen <xiaofanc@xxxxxxxxx> wrote:
> On Tue, Jun 21, 2011 at 2:05 PM, Caleb Kemere <ckemere@xxxxxxxxx> wrote:
>> On Mon, Jun 20, 2011 at 10:03 PM, Xiaofan Chen <xiaofanc@xxxxxxxxx> wrote:
>>> On Tue, Jun 21, 2011 at 8:09 AM, Caleb Kemere <ckemere@xxxxxxxxx> wrote:
>>>
>>> If you use synchronous API, the main way to help is to increase
>>> the bufferSize but then there is a limit. After all, the internal buffer
>>> of FTDI chips are limited (two buffers, each 512Bytes in the FIFO
>>> mode, for FT2232H, even though the RX/TX buffer is 4KB). So if
>>> the host fails to fire one USB IN request for 512B, you are still
>>> fine but if the host fails to fire two USB IN requests, you are done.
>>

--
libftdi - see http://www.intra2net.com/en/developer/libftdi for details.
To unsubscribe send a mail to libftdi+unsubscribe@xxxxxxxxxxxxxxxxxxxxxxx
Subject: Re: FT2232H asynchronous maximum data rates?