Well... Fuck. That is really impressive. I'm curious, how much effort and time did this take, disregarding software learning time, compared to drawing this with a more traditional frame rate?
I'm curious just how much labor effort and time this can save you.
That actually was really good. I'm sure I could quibble a bit over the facial expressions and lack of motion both there and with her (ahem) upper body structure, but overall, I think that went really, really well. I could see the "lightning scene" from Better This Way being really well depicted in this version...
On the technical side, this is images from the comic page cropped out and run through cog video, then upscaled and image-to-image filtered in stable diffusion using a LORA and hypernet that I trained on images of the morale officer. (The comic page itself also has backgrounds that are enhanced with stable diffusion - I sketched and shaded a rough background, then filtered them to add greeblies). I then ran it through RIFE to interpolate from 8fps to 16fps.
This took me a couple days of experimenting and messing around, with a large amount of compute time in between. It takes ~10 minutes to render a 6 second clip so it's a matter of figuring out a prompt and parameters, running a batch over night, then cherry picking the best results.
Hand animating this scene would take weeks, but the results and methods aren't really comparable. You're seeing this scene because it worked, where several other scenes did not. While I was generating clips, it was not clear that the walking shot was going to work at all as there were lots of derpy results, and even in the final clip I couldn't use the whole thing. But the final result is very realistic with shading and 3D camera motion that would be very hard to duplicate traditionally. The same with the shot of her walking down the stairs - it looks just like a handheld camera shot and I think you would need to have a motion capture source to duplicate that.
There are lots of things the model just can't do. Like having her walk into the hallway and turn down the stairs - not happening. It has to go forward from the image you provide, so it can't have a can't have a character walk into frame or hit a specific mark. If this model had the 2 keyframe targets that tooncrafter has it would be amazing, but I don't know if that's possible. Also just on some preliminary testing, I don't think it can make the princess from better this way jump up in surprise in the lightning scene.
It favors realistic shading - if you give it a flat shaded character you can get limited animation or lots of distortion. I've had mixed results on some of my characters but it seems to pick up well on this style of the morale officer. The output from cogvideo is very smooth, but it can do some weird things to faces and the overall shading style, thus I ran it through stable diffusion which has it's own problems and adds some flickering.
Re: Jiggle physics, I've gotten some ok results by feeding it photorealistic stable diffusion renders of girls in loose tshirts. Here's some misc results for science: https://satinminions.com/Cog-Misc-Results.html
The model seems to shine at realistic scenes with minimal movement. You get lots of subtle secondary motion and it holds the character's proportions well. I think there might be a hybrid use where you draw a character, generate an animation, then roto over it. Also for backgrounds and establishing shots, you can add rotation and animation to what would otherwise be an ordinary pan. Another case that seems to work with a little fiddling is techno babble displays, like the brain scan and graph stuff that appears on the morale officer's hud. So there are use cases but it's not the end times yet.