Why do some games make a larger video file than others in recording?

I have a video in Minecraft that is 1 hour, 12 minutes and it is 4.89 GB, while I also have a video of Arma 3 that is 1 hour and 12 minutes long, which is 10.31 GB. Why?

I'd guess that's got to do with minecraft having lower resolution textures, what with it being voxel based. More detailed textures are harder to compress, they would end up taking more space. But then again maybe you're running arma at a higher screen resolution, that would also make a difference. And if minecraft's frame rate were capped at, say, 30 fps, wheras arma's wasn't capped at all, and was averaging higher than 30fps, that would make a difference as well.

everything else being equal, I'd guess it's the textures

There's pretty much 6 key things that will affect the storage size of a video.

* Screen Resolution
* Framerate
* Complexity of the Image
* The amount of change per frame
* Rendering Method (Progressive or Interlace)
* Bitrate of recording

The first two (Resolution & Framerate) are pretty simple to understand. The higher the resolution & the more fps you have, the more bytes you'll need to record it.

The other 4 are a bit more complex, but these can make a world of difference… So I'll go into more detail here…

* COMPLEXITY OF THE IMAGE refers to how detailed the image is. Video codecs like to "group" pixel together as much as possible as it helps to simplify (or compress) the information it needs to note, sometimes grouping similarly colored pixels together as needed. A simple image (like a smiley face) is a lot easier to compress than a high-resolution image of a forest where you can see the individual blades of grass on the ground.

A checkboard pattern is a prime example here for both simplicity & complexity. A checkboard pattern with squares that are 100 pixels wide will be easier to compress than squares that are 10 pixels wide (as the pattern is 10 times more complex) & easier than squares that are only 1 pixel wide (as the pattern is 100 times more complex than the original pattern).

* AMOUNT OF CHANGE PER FRAME refers to movement of how many pixel actually changed between the last frame & the next frame. When video is recorded digitally, the video codec doesn't record the entire frame every time. The video codec will compare the current frame with the last frame & discard all the pixels that didn't change to simplify (& compress) the information it needs to keep. The video code will still occasionally record an entire frame (known as a "keyframe"), but will typical do so at a certain threshold (like 50% of more of the frame changed), at specific times (like once every 30 frames) or a combination of both

If you move quickly or there's a lot of action happening on the screen, the video codec has to grab more key frames (increasing the video size). If you move slowly or idle, the video codec will only grab a part "update" frame more often.

Going back to checkerboard pattern. Let's assume it's moving to the right at a rate of 1 pixel per frame…

With the 100 pixel squares, 2% of each frame is changing…
With the 10 pixel squares, 20% of each frame is changing…
With the 1 pixel squares, 100% of each frame is changing (forcing a keyframe capture)

* RENDERING METHOD refers to HOW the video is, well, rendered. This is typically referred to in one of two methods: Progressive & Interlaced.

Progress rendering is the most straightforward deal as the whole frame is rendered every time.

Interlaced rendering is a little more obscure as it renders every OTHER horizontal line in a frame, requiring 2 frames to render a whole picture (as one frame would render the odd lines & the next frame would render the even lines). This effectively cuts the framerate in half, but lowers the processing requirements.

* BITRATE OF RECORDING refers to the maximum number of bits (& in respect, bytes) dedicated per second of recording. This figure limits the quality of image per frame of video & how "sharp" we can really get. You'll generally see bitrates noted as Variable (VBR), which provides flexibility (going lower, when permitted) & Constant (CBR), which sets a hard number. This is measured in "bits" per second (bps).

Going with a Constant Bitrate, using 60 FPS for our framerate & progressive rendering…

> 60 bps means each frame has one bit of data
>> The 1 block (entire frame) is black or white

> 480 bps means each frame has 8 bits (1 byte) of data
>> 8 Blocks of black or white
>> The 1 Block (entire frame) is one shade of grey

> 1,440 bps (1.44 Kbps) means each frame has 24 bits (3 bytes) of data
>> 24 Blocks of black or white
>> 3 Blocks of grey
>> 1 Block (entire frame) of any color (as a mixture of Red, Green & Blue)

While I know these are ludicrously slow bitrates, it shows how little can be done with raw data. I will note that "blocks" may be slightly misleading as I was going with relative size to your frame of reference. To bring this to the logical conclusion, each "block" would be a "pixel".

So if we wanted to do render a small video in color (320 x 240 interlaced… Or 240i by today's standards)… That would be 320 x 240 (resolution of the video) x 24 (color) x 60 (fps) / 2 (interlacing) = 55,296,000 bps OR 55.296 Mbps (= 6.912 MBps) in a raw format, which is very inefficient. Thankfully, we have video codecs (like H.264) with all these "shortcuts" that can compress the video data & allow us to use lower bitrates (like 2 Mbps for online streaming) as a result.

Since the bitrate notes how much data is dedicated to each second of video, the video code can devote more data to the keyframes (where it needs to render an entire frame) & skimp on the intermediary frames between them. This is also reason why video streams that have a moving checkboard pattern or a lot of action on the screen tends to get blocky (or pixelated) as the video codec is struggling to render the video within the bitrate limitations (as there's too much stuff that has to be updated between frames).

I know this has gotten a bit technical, but it shows how videos of the same play time can require different amounts of storage space, even when all settings are the same & assuming the content being recorded is similar in nature. Minecraft with it's blocky graphics & low-resolution textures isn't that complex to record & doesn't update as many pixels per frame compared to ARMA III with it's more realistic graphics & higher resolution textures.

Hope this sheds some light on the subject for you.