This is a look into how certain EXEPACK-related programs
handle the min_extra_paragraphs
field in the
EXE header.
This field is also known as e_minalloc
in IMAGE_DOS_HEADER
terms.
Last updated:
I found a copy of EXEPACK.EXE at the PCjs page for Microsoft Macro Assembler 4.00. There are various other versions available. Select the dropdown for the B: drive, ensure MS Macro Assembler 4.00 is selected, click Load, then Save.
WinWorld is another source for disk images. The Internet Archive has a "Microsoft MASM 4 beta", whose differences from 4.00 I did not examine thoroughly.
Use Mtools to examine and extract the disk image:
$ mdir -i MASM-016014-400.img Volume in drive : has no label Directory for ::/ MASM EXE 85566 1985-10-16 4:00 LINK EXE 43988 1985-10-16 4:00 SYMDEB EXE 37021 1985-10-16 4:00 MAPSYM EXE 18026 1985-10-16 4:00 CREF EXE 15028 1985-10-16 4:00 LIB EXE 28716 1985-10-16 4:00 MAKE EXE 24300 1985-10-16 4:00 EXEPACK EXE 10848 1985-10-16 4:00 EXEMOD EXE 11034 1985-10-16 4:00 COUNT ASM 5965 1985-10-16 4:00 README DOC 7630 1985-10-16 4:00 11 files 288 122 bytes 69 632 bytes free $ mkdir MASM-016014-400 $ cd MASM-016014-400 $ mcopy -i ../MASM-016014-400.img -s ::/ ./
Inside DOSBox or similar, you can run the programs and see the version numbers.
C:\>MASM.EXE Microsoft (R) Macro Assembler Version 4.00 Copyright (C) Microsoft Corp 1981, 1983, 1984, 1985. All rights reserved. C:\>LINK.EXE Microsoft (R) 8086 Object Linker Version 3.05 Copyright (C) Microsoft Corp 1983, 1984, 1985. All rights reserved. C:\>EXEPACK.EXE Microsoft (R) EXE File Compression Utility Version 4.00 Copyright (C) Microsoft Corp 1985. All rights reserved.
The disk image conveniently comes with a sample program, COUNT.ASM. Let's EXEPACK-compress it two ways, using EXEPACK.EXE and the /EXEPACK option to LINK.EXE.
C:\>MASM.EXE COUNT.ASM,COUNT.OBJ; C:\>LINK.EXE COUNT.OBJ,COUNT.EXE; C:\>EXEPACK.EXE COUNT.EXE COUNTE.EXE C:\>LINK.EXE /EXEPACK COUNT.OBJ,COUNTL.EXE;
The two compressed files are not identical. Using Rabin2 and Radiff2, we see that there are only trivial differences:
ret
instruction).$ du -b COUNT*.EXE 3081 COUNT.EXE 1092 COUNTE.EXE 1092 COUNTL.EXE $ sha256sum COUNT*.EXE 10e86814a369a9cf12e7d0ea6930fdf3184692e4cdddae7627aea9ba0add4624 COUNT.EXE ab629d01a7e99e20153b6dd85c87f5adb9fa211c4daa2a6cc67cc12772973ba1 COUNTE.EXE 548bc5075fc8e98acf2f53e903619ca7e9595a543618fb9a75b45621743bf1b5 COUNTL.EXE $ rabin2 -H COUNT.EXE [0000:0000] Signature MZ [0000:0002] BytesInLastBlock 0x0009 [0000:0004] BlocksInFile 0x0007 [0000:0006] NumRelocs 0x0001 [0000:0008] HeaderParagraphs 0x0020 [0000:000a] MinExtraParagraphs 0x0000 [0000:000c] MaxExtraParagraphs 0xffff [0000:000e] InitialSs 0x0000 [0000:0010] InitialSp 0x0100 [0000:0012] Checksum 0xfdf4 [0000:0014] InitialIp 0x000c [0000:0016] InitialCs 0x0094 [0000:0018] RelocTableOffset 0x001e [0000:001a] OverlayNumber 0x0000 $ rabin2 -H COUNTE.EXE [0000:0000] Signature MZ [0000:0002] BytesInLastBlock 0x0044 [0000:0004] BlocksInFile 0x0003 [0000:0006] NumRelocs 0x0000 [0000:0008] HeaderParagraphs 0x0020 [0000:000a] MinExtraParagraphs 0x0098 [0000:000c] MaxExtraParagraphs 0xffff [0000:000e] InitialSs 0x00b5 [0000:0010] InitialSp 0x0080 [0000:0012] Checksum 0x1399 [0000:0014] InitialIp 0x0010 [0000:0016] InitialCs 0x0011 [0000:0018] RelocTableOffset 0x001e [0000:001a] OverlayNumber 0x0000 $ rabin2 -H COUNTL.EXE [0000:0000] Signature MZ [0000:0002] BytesInLastBlock 0x0044 [0000:0004] BlocksInFile 0x0003 [0000:0006] NumRelocs 0x0000 [0000:0008] HeaderParagraphs 0x0020 [0000:000a] MinExtraParagraphs 0x0098 [0000:000c] MaxExtraParagraphs 0xffff [0000:000e] InitialSs 0x00b5 [0000:0010] InitialSp 0x0080 [0000:0012] Checksum 0x0000 [0000:0014] InitialIp 0x0010 [0000:0016] InitialCs 0x0011 [0000:0018] RelocTableOffset 0x001e [0000:001a] OverlayNumber 0x0000 $ radiff2 COUNTE.EXE COUNTL.EXE 0x00000012 9913 => 0000 0x00000012 0x00000301 00 => c3 0x00000301
Either way, compression has changed the value of the
min_extra_paragraphs
field
(which Rabin2 calls MinExtraParagraphs
)
from 0x0000 to 0x0098 (152 decimal).
Where does this come from?
The formula for the size of the program text is
blocks_in_file
− 1) + bytes_in_last_block
− 16×header_paragraphs
The formula for the size of the additional memory is
min_extra_paragraphs
Adding these two values together gives the total runtime size of the program.
file | program size | extra size | total size |
---|---|---|---|
COUNT.EXE | 2569 | 0 | 2569 |
COUNTE.EXE | 580 | 2432 | 3012 |
The difference in program sizes accounts for 124 of the
152 paragraphs in the min_extra_paragraphs
of COUNTE.EXE.
The remaining 28 paragraphs come from the size of the EXEPACK block itself
and its 8-paragraph stack (see initial_sp
).
In this case, min_extra_paragraphs
had to increase
to account for the overhead of the EXEPACK block.
But if the original min_extra_paragraphs
is large enough,
the EXEPACK block can make use of the same space,
and therefore the difference in min_extra_paragraphs
is simply the difference in program sizes.
The formula used by EXEPACK.EXE to compute the new min_extra_paragraphs
is:
out.min_extra_paragraphs
= in.program_paragraphs
+ max(in.min_extra_paragraphs
, exepack_paragraphs
+8) − out.program_paragraphs
The formula comes from reverse engineering part of the program:
fix_exe_header: 0be6 55 push bp 0be7 8bec mov bp, sp 0be9 b80a00 mov ax, 10 ; Reserve space for local variables. ; bp-10 uint16_t exepack_paragraphs ; bp-8 uint16_t out_file_size_low ; bp-6 uint16_t out_file_size_high 0bec e8ed04 call stack_check 0bef 57 push di 0bf0 56 push si 0bf1 a1c42b mov ax, word [exepack_size] 0bf4 050f00 add ax, 15 0bf7 b104 mov cl, 4 0bf9 d3e8 shr ax, cl 0bfb 8946f6 mov word [exepack_paragraphs], ax ; exepack_paragraphs = (exepack_size+15)/16 0bfe b80200 mov ax, 2 0c01 50 push ax 0c02 2bc0 sub ax, ax 0c04 50 push ax 0c05 50 push ax 0c06 ff36bc2b push word [out_fd] 0c0a e8db06 call file_seek 0c0d 83c408 add sp, 8 0c10 8946f8 mov word [out_file_size_low], ax 0c13 8956fa mov word [out_file_size_high], dx ; out_file_size = file_seek(out_fd, 0, 0, SEEK_END) 0c16 80e401 and ah, 1 0c19 a3a802 mov word [out_bytes_in_last_block], ax ; out_bytes_in_last_block = out_file_size % 512 0c1c 8b46f8 mov ax, word [out_file_size_low] 0c1f 05ff01 add ax, 511 0c22 83d200 adc dx, 0 0c25 b109 mov cl, 9 0c27 e87005 call shr_long 0c2a a3aa02 mov word [out_blocks_in_file], ax ; out_blocks_in_file (out_file_size+511)/512 0c2d a1c02b mov ax, word [in_exe_size_low] 0c30 8b16c22b mov dx, word [in_exe_size_high] 0c34 b104 mov cl, 4 0c36 e86105 call shr_long ; dx:ax = in_exe_size/16 0c39 8b4ef6 mov cx, word [exepack_paragraphs] 0c3c 03c8 add cx, ax 0c3e 890eb402 mov word [out_ss], cx ; out_ss = in_exe_size/16 + exepack_paragraphs 0c42 c706b6028000 mov word [out_sp], 0x80 ; out_sp = 0x80 0c48 a15007 mov ax, word [compressed_paragraphs] 0c4b 0106bc02 add word [out_cs], ax ; out_cs += compressed_paragraphs 0c4f a1c02b mov ax, word [in_exe_size_low] 0c52 8b16c22b mov dx, word [in_exe_size_high] 0c56 b104 mov cl, 4 0c58 e83f05 call shr_long ; dx:ax = in_exe_size/16 0c5b 8b4ef6 mov cx, word [exepack_paragraphs] 0c5e 83c108 add cx, 8 ; cs = exepack_paragraphs+8 0c61 8bf8 mov di, ax 0c63 3b0e5c07 cmp cx, word [in_min_extra_paragraphs] ; exepack_paragraphs+8 >= in_min_extra_paragraphs? 0c67 7305 jae l1 ; in_min_extra_paragraphs is greater. 0c69 a15c07 mov ax, word [in_min_extra_paragraphs] ; ax = in_min_extra_paragraphs 0c6c eb06 jmp set_min_extra_paragraphs l1: ; exepack_paragraphs+8 is greater. 0c6e 8b46f6 mov ax, word [exepack_paragraphs] 0c71 050800 add ax, 8 ; ax = exepack_paragraphs+8 set_min_extra_paragraphs: 0c74 2b46f6 sub ax, word [exepack_paragraphs] ; ax -= exepack_paragraphs 0c77 03c7 add ax, di ; ax += in_exe_size/16 0c79 2b065007 sub ax, word [compressed_paragraphs] ; ax -= compressed_paragraphs ; out_min_extra_paragraphs = in_exe_size/16 + max(in_min_extra_paragraphs, exepack_paragraphs+8) - (compressed_paragraphs + exepack_paragraphs) 0c7d a3b002 mov word [out_min_extra_paragraphs], ax 0c80 a15e07 mov ax, word [in_max_extra_paragraphs] 0c83 a3b202 mov word [out_max_extra_paragraphs], ax ; out_max_extra_paragraphs = in_max_extra_paragraphs
Microsoft EXEPACK.EXE will refuse to run
if the output file would be bigger than the input file.
My exepack program does support this, though,
so it uses a slightly more complicated formula
(which is equivalent in the case that out.program_paragraphs
≤ in.program_paragraphs
):
out.min_extra_paragraphs
= max(in.program_paragraphs
+ in.min_extra_paragraphs
,in.program_paragraphs
+ exepack_paragraphs
+ 8,out.program_paragraphs
+ exepack_paragraphs
+ 8out.program_paragraphs
When UNP decompresses a file,
it sets min_extra_paragraphs
according to the formula
out.min_extra_paragraphs
= max(0x1000, in.program_paragraphs
+ 512 + in.min_extra_paragraphs
) − 512 − out.program_paragraphs
In the case of a largish program that has in.program_paragraphs
+ in.min_extra_paragraphs
≥ 0x1000 − 512,
the formula simplifies to
out.min_extra_paragraphs
= in.program_paragraphs
+ in.min_extra_paragraphs
− out.program_paragraphs
This computation can be read from the file u4.asm in the
UNP source code.
The MoreStrucInfo
label sets
TotalMem
= max(0x1000, in_ExeImageSz
/16 + EXTRAMEM
+ in_MinParMem
):
MoreStrucInfo: ; ... mov ds,SegEHInfo.A ASSUME ds:NOTHING mov ax,ExeImageSz mov dx,ExeImageSz+2 div ParSize ;; ax = ExeImageSz / 16 xor dx,dx add ax,EXTRAMEM ;; ax += 512 adc dl,0 add ax,ds:[MinParMem] ;; ax += MinParMem adc dl,0 or dx,dx ; size above 1Mb ? jne LoadError cmp ax,01000h ; 64K? jae UseMem mov ax,01000h ;; ax = 0x1000 UseMem: mov TotalMem,ax ;; TotalMem = ax
The CalcSize
label then does
out_MinParMem
= TotalMem
− EXTRAMEM
− (out_ExeImageSz
+1)/16:
CalcSize: mov ax,ProgFinalSeg ; calculate new image size xor dx,dx sub ax,SegProgram sbb dx,0 mov cx,4 LongMul16: shl ax,1 rcl dx,1 loop LongMul16 ;; dx:ax = (ProgFinalSeg - SegProgram) * 16 add ax,ProgFinalOfs adc dx,0 ;; dx:ax += ProgFinalOfs add ax,ExeSizeAdjust adc dx,[ExeSizeAdjust+2] ;; dx:ax += 1 (not sure what this is for) mov ExeImageSz,ax mov ExeImageSz+2,dx div ParSize xchg ax,bx ;; bx = ExeImageSz/16 mov ax,TotalMem ;; ax = TotalMem sub ax,EXTRAMEM ;; ax += EXTRAMEM sub ax,bx ;; ax -= ExeImageSz/16 cmp ax,0A000h jb MinMemOk xor ax,ax ; no minimal memory MinMemOk: cmp HeaderStored,0 jne _label01 mov es:[MinParMem],ax ;; MinParMem = ax
See it in action using the compressed COUNTE.EXE from the EXEPACK.EXE section:
C:\>UNP.EXE -v COUNTE.EXE COUNTEU.EXE UNP 4.11 Executable file restore utility, written by Ben Castricum, 05/30/95 INFO - DOS Version 5.00 INFO - Commandline = "E -I -K+ -U -V COUNTE.EXE COUNTEU.EXE". INFO - Using UNPTEMP$.$$$ as temp file. INFO - Wildcard matches 1 filename(s), stored at 0000h. INFO - Program loaded at 0192h, largest free memory block: 632123 bytes. processing file : COUNTE.EXE DOS file size : 1092 file-structure : executable (EXE) EXE part sizes : header 512 bytes, image 580 bytes, overlay 0 bytes INFO - File uses 0 fixups and requires atleast 3012 bytes to load. INFO - Loading program at 1010h, blocksize 65536 bytes. INFO - Required mem. 0098h, desired mem. FFFFh, header slack 484 bytes. processed with : EXEPACK V4.00 action : decompressing... done new file size : 2608 writing to file : COUNTEU.EXE
$ rabin2 -H COUNTE.EXE [0000:0000] Signature MZ [0000:0002] BytesInLastBlock 0x0044 [0000:0004] BlocksInFile 0x0003 [0000:0006] NumRelocs 0x0000 [0000:0008] HeaderParagraphs 0x0020 [0000:000a] MinExtraParagraphs 0x0098 [0000:000c] MaxExtraParagraphs 0xffff [0000:000e] InitialSs 0x00b5 [0000:0010] InitialSp 0x0080 [0000:0012] Checksum 0x1399 [0000:0014] InitialIp 0x0010 [0000:0016] InitialCs 0x0011 [0000:0018] RelocTableOffset 0x001e [0000:001a] OverlayNumber 0x0000 $ rabin2 -H COUNTEU.EXE [0000:0000] Signature MZ [0000:0002] BytesInLastBlock 0x0030 [0000:0004] BlocksInFile 0x0006 [0000:0006] NumRelocs 0x0001 [0000:0008] HeaderParagraphs 0x0002 [0000:000a] MinExtraParagraphs 0x0d5f [0000:000c] MaxExtraParagraphs 0xffff [0000:000e] InitialSs 0x0000 [0000:0010] InitialSp 0x0100 [0000:0012] Checksum 0x1399 [0000:0014] InitialIp 0x000c [0000:0016] InitialCs 0x0094 [0000:0018] RelocTableOffset 0x001c [0000:001a] OverlayNumber 0x0000
TotalMem
is set to
The size of this program is below UNP's minimum memory threshold.
Then out_MinParMem
becomes
Because UNP does not round up its
in_ExeImageSz
and out_ExeImageSz
to a multiple of 16 before dividing,
it may compute a value of out_MinParMem
that is 1 paragraph smaller than it should be.