The problem is that digitally stretching just a little creates noticeable artifacts; not quite clicking, not quite gargling, but annoying nonetheless. What I finally ended up doing was resetting the format from 44.1k to 48k and then pitch-shifting to try and match the duration of the video-- more or less, since the commentary tracks were not exactly the right length, and it took a lot of trial and error: each change took something like 20 minutes to render, then I had to try and sync up the background audio to the video. It was tedious and frustrating.
I started at a little past 7:00, and I finally wrapped a little after midnight.
"But marmot," you may ask, "Why didn't you record it with a converter that's locked to the video to begin with and save yourself the headache?"
Quite simply because the only genlocked A/D converters that I have are in my studio and not portable, and for various reasons I had to do this on location.
This is important to consider, since avindair has commented that any projects he wants to do in the future he wants to record dual-system (this is where you record the audio separately from the video), and I'm trying to talk him out of it as being an unnecessary and time-consuming step, particularly at the speed at which he wants to shoot. With dual-system, you can't run-n-gun, you have to clap-slate every take, you need more people (I'm basing this on a separate boom operator and recordist; it's possible to combine them into one person if you have a good portable recorder), and you have to spend the time to re-sync the takes later. It may not seem like a lot to do for one take, but when you consider the hundreds (and sometimes thousands) of takes that you get on a feature film, it becomes a daunting task. I know, I've done it. And when the sample rates don't match, you will end up pulling out your hair. (For short takes, small sample rate mismatches aren't a problem-- it's the longer ones that are a PITA.
There are some pros to it, primarily the sound quality. If you have a consumer or prosumer-level camera, the sound quality in recording is pretty much guaranteed to be left-handed crap: it's usable, but it needs a lot of processing to clean up noise and hiss, and sometimes you just can't clean things up.
Another advantage is that you're not tethered to the camera. If you are shooting genlocked, you do need a tether to the video, and this is likely one of those situations where you need a separate sound recordist and boom operator.
The downside is that it takes more time, both to shoot and in postproduction.