svn commit: r53386 - head/en_US.ISO8859-1/books/developers-handbook/x86
Benedict Reuschling
bcr at FreeBSD.org
Sun Sep 8 20:08:15 UTC 2019
Author: bcr
Date: Sun Sep 8 20:08:15 2019
New Revision: 53386
URL: https://svnweb.freebsd.org/changeset/doc/53386
Log:
Mass cleanup of textproc/igor warnings including:
- use two spaces at sentence start
- space before content
- wrap long line
- start content on same line
- straggling <tag>
- put listing on same line
- add blank line after <tag> on previous line
Modified:
head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml
Modified: head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml
==============================================================================
--- head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml Sun Sep 8 19:40:52 2019 (r53385)
+++ head/en_US.ISO8859-1/books/developers-handbook/x86/chapter.xml Sun Sep 8 20:08:15 2019 (r53386)
@@ -532,16 +532,16 @@ sys.err:
<para>The library approach may seem inconvenient at first because
it requires you to produce a separate file your code depends on.
But it has many advantages: For one, you only need to write it
- once and can use it for all your programs. You can even let
+ once and can use it for all your programs. You can even let
other assembly language programmers use it, or perhaps use one
- written by someone else. But perhaps the greatest advantage of
+ written by someone else. But perhaps the greatest advantage of
the library is that your code can be ported to other systems,
even by other programmers, by simply writing a new library
without any changes to your code.</para>
<para>If you do not like the idea of having a library, you can at
least place all your system calls in a separate assembly
- language file and link it with your main program. Here, again,
+ language file and link it with your main program. Here, again,
all porters have to do is create a new object file to link with
your main program.</para>
</sect2>
@@ -554,7 +554,7 @@ sys.err:
include in your code.</para>
<para>Porters of your software will simply write a new include
- file. No library or external object file is necessary, yet your
+ file. No library or external object file is necessary, yet your
code is portable without any need to edit the code.</para>
<note>
@@ -651,111 +651,100 @@ access.the.bsd.kernel:
<para>Lines 3-5 are the data: Line 3 starts the data
section/segment. Line 4 contains the string "Hello, World!"
- followed by a new line (<constant>0Ah</constant>). Line 5 creates
+ followed by a new line (<constant>0Ah</constant>). Line 5 creates
a constant that contains the length of the string from line 4 in
bytes.</para>
- <para> Lines 7-16 contain the code. Note that FreeBSD uses the
+ <para>Lines 7-16 contain the code. Note that FreeBSD uses the
<emphasis>elf</emphasis> file format for its executables, which
requires every program to start at the point labeled
<varname>_start</varname> (or, more precisely, the linker expects
- that). This label has to be global.</para>
+ that). This label has to be global.</para>
<para>Lines 10-13 ask the system to write <varname>hbytes</varname>
bytes of the <varname>hello</varname> string to
<varname>stdout</varname>.</para>
<para>Lines 15-16 ask the system to end the program with the return
- value of <constant>0</constant>. The <function
+ value of <constant>0</constant>. The <function
role="syscall">SYS_exit</function> syscall never returns, so the
code ends there.</para>
<note>
<para>If you have come to &unix; from <acronym>&ms-dos;</acronym>
assembly language background, you may be used to writing
- directly to the video hardware. You will never have to worry
- about this in FreeBSD, or any other flavor of &unix;. As far as
+ directly to the video hardware. You will never have to worry
+ about this in FreeBSD, or any other flavor of &unix;. As far as
you are concerned, you are writing to a file known as
- <filename>stdout</filename>. This can be the video screen, or a
+ <filename>stdout</filename>. This can be the video screen, or a
<application>telnet</application> terminal, or an actual file,
- or even the input of another program. Which one it is, is for
+ or even the input of another program. Which one it is, is for
the system to figure out.</para>
</note>
- <sect2 xml:id="x86-assemble-1"><title>Assembling the Code</title>
+ <sect2 xml:id="x86-assemble-1">
+ <title>Assembling the Code</title>
- <para>Type the code (except the line numbers) in an editor, and save
- it in a file named <filename>hello.asm</filename>. You need
- <application>nasm</application> to assemble it.</para>
+ <para>Type the code (except the line numbers) in an editor, and
+ save it in a file named <filename>hello.asm</filename>. You
+ need <application>nasm</application> to assemble it.</para>
- <sect3 xml:id="x86-get-nasm"><title>Installing <application>nasm</application></title>
+ <sect3 xml:id="x86-get-nasm">
+ <title>Installing <application>nasm</application></title>
<para>If you do not have <application>nasm</application>,
type:</para>
-<screen>&prompt.user; <userinput>su</userinput>
+ <screen>&prompt.user; <userinput>su</userinput>
Password:<userinput><replaceable>your root password</replaceable></userinput>
&prompt.root; <userinput>cd /usr/ports/devel/nasm</userinput>
&prompt.root; <userinput>make install</userinput>
&prompt.root; <userinput>exit</userinput>
&prompt.user;</screen>
-<para>
-You may type <userinput>make install clean</userinput> instead of just
-<userinput>make install</userinput> if you do not want to keep
-<application>nasm</application> source code.
-</para>
+ <para>You may type <userinput>make install clean</userinput>
+ instead of just <userinput>make install</userinput> if you do
+ not want to keep <application>nasm</application> source
+ code.</para>
-<para>
-Either way, FreeBSD will automatically download
-<application>nasm</application> from the Internet,
-compile it, and install it on your system.
-</para>
+ <para>Either way, FreeBSD will automatically download
+ <application>nasm</application> from the Internet, compile it,
+ and install it on your system.</para>
-<note>
-<para>
-If your system is not FreeBSD, you need to get
-<application>nasm</application> from its
-<link xlink:href="https://sourceforge.net/projects/nasm">home
-page</link>. You can still use it to assemble FreeBSD code.
-</para>
-</note>
+ <note>
+ <para>If your system is not FreeBSD, you need to get
+ <application>nasm</application> from its <link
+ xlink:href="https://sourceforge.net/projects/nasm">home
+ page</link>. You can still use it to assemble FreeBSD
+ code.</para>
+ </note>
-<para>
-Now you can assemble, link, and run the code:
-</para>
+ <para>Now you can assemble, link, and run the code:</para>
-<screen>&prompt.user; <userinput>nasm -f elf hello.asm</userinput>
+ <screen>&prompt.user; <userinput>nasm -f elf hello.asm</userinput>
&prompt.user; <userinput>ld -s -o hello hello.o</userinput>
&prompt.user; <userinput>./hello</userinput>
Hello, World!
&prompt.user;</screen>
-
-</sect3>
-
-</sect2>
-
+ </sect3>
+ </sect2>
</sect1>
<sect1 xml:id="x86-unix-filters">
-<title>Writing &unix; Filters</title>
+ <title>Writing &unix; Filters</title>
-<para>
-A common type of &unix; application is a filter—a program
-that reads data from the <filename>stdin</filename>, processes it
-somehow, then writes the result to <filename>stdout</filename>.
-</para>
+ <para>A common type of &unix; application is a filter—a
+ program that reads data from the <filename>stdin</filename>,
+ processes it somehow, then writes the result to
+ <filename>stdout</filename>.</para>
-<para>
-In this chapter, we shall develop a simple filter, and
-learn how to read from <filename>stdin</filename> and write to
-<filename>stdout</filename>. This filter will convert each byte
-of its input into a hexadecimal number followed by a
-blank space.
-</para>
+ <para>In this chapter, we shall develop a simple filter, and
+ learn how to read from <filename>stdin</filename> and write to
+ <filename>stdout</filename>. This filter will convert each byte
+ of its input into a hexadecimal number followed by a blank
+ space.</para>
-<programlisting>
-%include 'system.inc'
+ <programlisting>%include 'system.inc'
section .data
hex db '0123456789ABCDEF'
@@ -793,102 +782,85 @@ _start:
.done:
push dword 0
- sys.exit
-</programlisting>
-<para>
-In the data section we create an array called <varname>hex</varname>.
-It contains the 16 hexadecimal digits in ascending order.
-The array is followed by a buffer which we will use for
-both input and output. The first two bytes of the buffer
-are initially set to <constant>0</constant>. This is where we will write
-the two hexadecimal digits (the first byte also is
-where we will read the input). The third byte is a
-space.
-</para>
+ sys.exit</programlisting>
-<para>
-The code section consists of four parts: Reading the byte,
-converting it to a hexadecimal number, writing the result,
-and eventually exiting the program.
-</para>
+ <para>In the data section we create an array called
+ <varname>hex</varname>. It contains the 16 hexadecimal digits
+ in ascending order. The array is followed by a buffer which
+ we will use for both input and output. The first two bytes of
+ the buffer are initially set to <constant>0</constant>. This
+ is where we will write the two hexadecimal digits (the first
+ byte also is where we will read the input). The third byte is
+ a space.</para>
-<para>
-To read the byte, we ask the system to read one byte
-from <filename>stdin</filename>, and store it in the first byte
-of the <varname>buffer</varname>. The system returns the number
-of bytes read in <varname role="register">EAX</varname>. This will be <constant>1</constant>
-while data is coming, or <constant>0</constant>, when no more input
-data is available. Therefore, we check the value of
-<varname role="register">EAX</varname>. If it is <constant>0</constant>,
-we jump to <varname>.done</varname>, otherwise we continue.
-</para>
+ <para>The code section consists of four parts: Reading the byte,
+ converting it to a hexadecimal number, writing the result, and
+ eventually exiting the program.</para>
-<note>
-<para>
-For simplicity sake, we are ignoring the possibility
-of an error condition at this time.
-</para>
-</note>
+ <para>To read the byte, we ask the system to read one byte from
+ <filename>stdin</filename>, and store it in the first byte of
+ the <varname>buffer</varname>. The system returns the number
+ of bytes read in <varname role="register">EAX</varname>. This
+ will be <constant>1</constant> while data is coming, or
+ <constant>0</constant>, when no more input data is available.
+ Therefore, we check the value of <varname
+ role="register">EAX</varname>. If it is
+ <constant>0</constant>, we jump to <varname>.done</varname>,
+ otherwise we continue.</para>
-<para>
-The hexadecimal conversion reads the byte from the
-<varname>buffer</varname> into <varname role="register">EAX</varname>, or actually just
-<varname role="register">AL</varname>, while clearing the remaining bits of
-<varname role="register">EAX</varname> to zeros. We also copy the byte to
-<varname role="register">EDX</varname> because we need to convert the upper
-four bits (nibble) separately from the lower
-four bits. We store the result in the first two
-bytes of the buffer.
-</para>
+ <note>
+ <para>For simplicity sake, we are ignoring the possibility of
+ an error condition at this time.</para>
+ </note>
-<para>
-Next, we ask the system to write the three bytes
-of the buffer, i.e., the two hexadecimal digits and
-the blank space, to <filename>stdout</filename>. We then
-jump back to the beginning of the program and
-process the next byte.
-</para>
+ <para>The hexadecimal conversion reads the byte from the
+ <varname>buffer</varname> into <varname
+ role="register">EAX</varname>, or actually just <varname
+ role="register">AL</varname>, while clearing the remaining
+ bits of <varname role="register">EAX</varname> to zeros. We
+ also copy the byte to <varname role="register">EDX</varname>
+ because we need to convert the upper four bits (nibble)
+ separately from the lower four bits. We store the result in
+ the first two bytes of the buffer.</para>
-<para>
-Once there is no more input left, we ask the system
-to exit our program, returning a zero, which is
-the traditional value meaning the program was
-successful.
-</para>
+ <para>Next, we ask the system to write the three bytes of the
+ buffer, i.e., the two hexadecimal digits and the blank space,
+ to <filename>stdout</filename>. We then jump back to the
+ beginning of the program and process the next byte.</para>
-<para>
-Go ahead, and save the code in a file named <filename>hex.asm</filename>,
-then type the following (the <userinput>^D</userinput> means press the
-control key and type <userinput>D</userinput> while holding the
-control key down):
-</para>
+ <para>Once there is no more input left, we ask the system to
+ exit our program, returning a zero, which is the traditional
+ value meaning the program was successful.</para>
-<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
+ <para>Go ahead, and save the code in a file named
+ <filename>hex.asm</filename>, then type the following (the
+ <userinput>^D</userinput> means press the control key and type
+ <userinput>D</userinput> while holding the control key
+ down):</para>
+
+ <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
&prompt.user; <userinput>ld -s -o hex hex.o</userinput>
&prompt.user; <userinput>./hex</userinput>
<userinput>Hello, World!</userinput>
48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 0A <userinput>Here I come!</userinput>
48 65 72 65 20 49 20 63 6F 6D 65 21 0A <userinput>^D</userinput> &prompt.user;</screen>
-<note>
-<para>
-If you are migrating to &unix; from <acronym>&ms-dos;</acronym>,
-you may be wondering why each line ends with <constant>0A</constant>
-instead of <constant>0D 0A</constant>.
-This is because &unix; does not use the cr/lf convention, but
-a "new line" convention, which is <constant>0A</constant> in hexadecimal.
-</para>
-</note>
+ <note>
+ <para>If you are migrating to &unix; from
+ <acronym>&ms-dos;</acronym>, you may be wondering why each
+ line ends with <constant>0A</constant> instead of
+ <constant>0D 0A</constant>. This is because &unix; does not
+ use the cr/lf convention, but a "new line" convention, which
+ is <constant>0A</constant> in hexadecimal.</para>
+ </note>
-<para>
-Can we improve this? Well, for one, it is a bit confusing because
-once we have converted a line of text, our input no longer
-starts at the beginning of the line. We can modify it to print
-a new line instead of a space after each <constant>0A</constant>:
-</para>
+ <para>Can we improve this? Well, for one, it is a bit confusing
+ because once we have converted a line of text, our input no
+ longer starts at the beginning of the line. We can modify it
+ to print a new line instead of a space after each
+ <constant>0A</constant>:</para>
-<programlisting>
-%include 'system.inc'
+ <programlisting>%include 'system.inc'
section .data
hex db '0123456789ABCDEF'
@@ -935,29 +907,26 @@ _start:
.done:
push dword 0
- sys.exit
-</programlisting>
-<para>
-We have stored the space in the <varname role="register">CL</varname> register. We can
-do this safely because, unlike µsoft.windows;, &unix; system
-calls do not modify the value of any register they do not use
-to return a value in.
-</para>
+ sys.exit</programlisting>
-<para>
-That means we only need to set <varname role="register">CL</varname> once. We have, therefore,
-added a new label <varname>.loop</varname> and jump to it for the next byte
-instead of jumping at <varname>_start</varname>. We have also added the
-<varname>.hex</varname> label so we can either have a blank space or a
-new line as the third byte of the <varname>buffer</varname>.
-</para>
+ <para>We have stored the space in the <varname
+ role="register">CL</varname> register. We can do this
+ safely because, unlike µsoft.windows;, &unix; system
+ calls do not modify the value of any register they do not use
+ to return a value in.</para>
-<para>
-Once you have changed <filename>hex.asm</filename> to reflect
-these changes, type:
-</para>
+ <para>That means we only need to set <varname
+ role="register">CL</varname> once. We have, therefore,
+ added a new label <varname>.loop</varname> and jump to it for
+ the next byte instead of jumping at <varname>_start</varname>.
+ We have also added the <varname>.hex</varname> label so we can
+ either have a blank space or a new line as the third byte of
+ the <varname>buffer</varname>.</para>
-<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
+ <para>Once you have changed <filename>hex.asm</filename> to
+ reflect these changes, type:</para>
+
+ <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
&prompt.user; <userinput>ld -s -o hex hex.o</userinput>
&prompt.user; <userinput>./hex</userinput>
<userinput>Hello, World!</userinput>
@@ -966,42 +935,33 @@ these changes, type:
48 65 72 65 20 49 20 63 6F 6D 65 21 0A
<userinput>^D</userinput> &prompt.user;</screen>
-<para>
-That looks better. But this code is quite inefficient! We
-are making a system call for every single byte twice (once
-to read it, another time to write the output).
-</para>
+ <para>That looks better. But this code is quite inefficient! We
+ are making a system call for every single byte twice (once to
+ read it, another time to write the output).</para>
+ </sect1>
-</sect1>
+ <sect1 xml:id="x86-buffered-io">
+ <title>Buffered Input and Output</title>
-<sect1 xml:id="x86-buffered-io">
-<title>Buffered Input and Output</title>
+ <para>We can improve the efficiency of our code by buffering our
+ input and output. We create an input buffer and read a whole
+ sequence of bytes at one time. Then we fetch them one by one
+ from the buffer.</para>
-<para>
-We can improve the efficiency of our code by buffering our
-input and output. We create an input buffer and read a whole
-sequence of bytes at one time. Then we fetch them one by one
-from the buffer.
-</para>
+ <para>We also create an output buffer. We store our output in
+ it until it is full. At that time we ask the kernel to write
+ the contents of the buffer to
+ <filename>stdout</filename>.</para>
-<para>
-We also create an output buffer. We store our output in it until
-it is full. At that time we ask the kernel to write the contents
-of the buffer to <filename>stdout</filename>.
-</para>
+ <para>The program ends when there is no more input. But we
+ still need to ask the kernel to write the contents of our
+ output buffer to <filename>stdout</filename> one last time,
+ otherwise some of our output would make it to the output
+ buffer, but never be sent out. Do not forget that, or you
+ will be wondering why some of your output is missing.</para>
-<para>
-The program ends when there is no more input. But we still need
-to ask the kernel to write the contents of our output buffer
-to <filename>stdout</filename> one last time, otherwise some of our output
-would make it to the output buffer, but never be sent out.
-Do not forget that, or you will be wondering why some of your
-output is missing.
-</para>
+ <programlisting>%include 'system.inc'
-<programlisting>
-%include 'system.inc'
-
%define BUFSIZE 2048
section .data
@@ -1092,39 +1052,35 @@ write:
add esp, byte 12
sub eax, eax
sub ecx, ecx ; buffer is empty now
- ret
-</programlisting>
-<para>
-We now have a third section in the source code, named
-<varname>.bss</varname>. This section is not included in our
-executable file, and, therefore, cannot be initialized. We use
-<function role="opcode">resb</function> instead of <function role="opcode">db</function>.
-It simply reserves the requested size of uninitialized memory
-for our use.
-</para>
+ ret</programlisting>
-<para>
-We take advantage of the fact that the system does not modify the
-registers: We use registers for what, otherwise, would have to be
-global variables stored in the <varname>.data</varname> section. This is
-also why the &unix; convention of passing parameters to system calls
-on the stack is superior to the Microsoft convention of passing
-them in the registers: We can keep the registers for our own use.
-</para>
+ <para>We now have a third section in the source code, named
+ <varname>.bss</varname>. This section is not included in our
+ executable file, and, therefore, cannot be initialized. We
+ use <function role="opcode">resb</function> instead of
+ <function role="opcode">db</function>. It simply reserves
+ the requested size of uninitialized memory for our use.</para>
-<para>
-We use <varname role="register">EDI</varname> and <varname role="register">ESI</varname> as pointers to the next byte
-to be read from or written to. We use <varname role="register">EBX</varname> and
-<varname role="register">ECX</varname> to keep count of the number of bytes in the
-two buffers, so we know when to dump the output to, or read more
-input from, the system.
-</para>
+ <para>We take advantage of the fact that the system does not
+ modify the registers: We use registers for what, otherwise,
+ would have to be global variables stored in the
+ <varname>.data</varname> section. This is also why the
+ &unix; convention of passing parameters to system calls on the
+ stack is superior to the Microsoft convention of passing them
+ in the registers: We can keep the registers for our own
+ use.</para>
-<para>
-Let us see how it works now:
-</para>
+ <para>We use <varname role="register">EDI</varname> and
+ <varname role="register">ESI</varname> as pointers to the next
+ byte to be read from or written to. We use <varname
+ role="register">EBX</varname> and <varname
+ role="register">ECX</varname> to keep count of the number of
+ bytes in the two buffers, so we know when to dump the output
+ to, or read more input from, the system.</para>
-<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
+ <para>Let us see how it works now:</para>
+
+ <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
&prompt.user; <userinput>ld -s -o hex hex.o</userinput>
&prompt.user; <userinput>./hex</userinput>
<userinput>Hello, World!</userinput>
@@ -1133,17 +1089,15 @@ Let us see how it works now:
48 65 72 65 20 49 20 63 6F 6D 65 21 0A
<userinput>^D</userinput> &prompt.user;</screen>
-<para>
-Not what you expected? The program did not print the output
-until we pressed <userinput>^D</userinput>. That is easy to fix by
-inserting three lines of code to write the output every time
-we have converted a new line to <constant>0A</constant>. I have marked
-the three lines with > (do not copy the > in your
-<filename>hex.asm</filename>).
-</para>
+ <para>Not what you expected? The program did not print the
+ output until we pressed <userinput>^D</userinput>. That is
+ easy to fix by inserting three lines of code to write the
+ output every time we have converted a new line to
+ <constant>0A</constant>. I have marked the three lines with
+ > (do not copy the > in your
+ <filename>hex.asm</filename>).</para>
-<programlisting>
-%include 'system.inc'
+ <programlisting>%include 'system.inc'
%define BUFSIZE 2048
@@ -1238,14 +1192,11 @@ write:
add esp, byte 12
sub eax, eax
sub ecx, ecx ; buffer is empty now
- ret
-</programlisting>
+ ret</programlisting>
-<para>
-Now, let us see how it works:
-</para>
+ <para>Now, let us see how it works:</para>
-<screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
+ <screen>&prompt.user; <userinput>nasm -f elf hex.asm</userinput>
&prompt.user; <userinput>ld -s -o hex hex.o</userinput>
&prompt.user; <userinput>./hex</userinput>
<userinput>Hello, World!</userinput>
@@ -1254,265 +1205,214 @@ Now, let us see how it works:
48 65 72 65 20 49 20 63 6F 6D 65 21 0A
<userinput>^D</userinput> &prompt.user;</screen>
-<para>
-Not bad for a 644-byte executable, is it!
-</para>
+ <para>Not bad for a 644-byte executable, is it!</para>
-<note>
-<para>
-This approach to buffered input/output still
-contains a hidden danger. I will discuss—and
-fix—it later, when I talk about the
-<link linkend="x86-buffered-dark-side">dark
-side of buffering</link>.</para>
-</note>
+ <note>
+ <para>This approach to buffered input/output still
+ contains a hidden danger. I will discuss—and
+ fix—it later, when I talk about the <link
+ linkend="x86-buffered-dark-side">dark side of
+ buffering</link>.</para>
+ </note>
-<sect2 xml:id="x86-ungetc">
-<title>How to Unread a Character</title>
+ <sect2 xml:id="x86-ungetc">
+ <title>How to Unread a Character</title>
-<warning><para>
-This may be a somewhat advanced topic, mostly of interest to
-programmers familiar with the theory of compilers. If you wish,
-you may <link linkend="x86-command-line">skip to the next
-section</link>, and perhaps read this later.
-</para>
-</warning>
-<para>
-While our sample program does not require it, more sophisticated
-filters often need to look ahead. In other words, they may need
-to see what the next character is (or even several characters).
-If the next character is of a certain value, it is part of the
-token currently being processed. Otherwise, it is not.
-</para>
+ <warning>
+ <para>This may be a somewhat advanced topic, mostly of
+ interest to programmers familiar with the theory of
+ compilers. If you wish, you may <link
+ linkend="x86-command-line">skip to the next
+ section</link>, and perhaps read this later.</para>
+ </warning>
-<para>
-For example, you may be parsing the input stream for a textual
-string (e.g., when implementing a language compiler): If a
-character is followed by another character, or perhaps a digit,
-it is part of the token you are processing. If it is followed by
-white space, or some other value, then it is not part of the
-current token.
-</para>
+ <para>While our sample program does not require it, more
+ sophisticated filters often need to look ahead. In other
+ words, they may need to see what the next character is (or
+ even several characters). If the next character is of a
+ certain value, it is part of the token currently being
+ processed. Otherwise, it is not.</para>
-<para>
-This presents an interesting problem: How to return the next
-character back to the input stream, so it can be read again
-later?
-</para>
+ <para>For example, you may be parsing the input stream for a
+ textual string (e.g., when implementing a language
+ compiler): If a character is followed by another character,
+ or perhaps a digit, it is part of the token you are
+ processing. If it is followed by white space, or some other
+ value, then it is not part of the current token.</para>
-<para>
-One possible solution is to store it in a character variable,
-then set a flag. We can modify <function>getchar</function> to check the flag,
-and if it is set, fetch the byte from that variable instead of the
-input buffer, and reset the flag. But, of course, that slows us
-down.
-</para>
+ <para>This presents an interesting problem: How to return the
+ next character back to the input stream, so it can be read
+ again later?</para>
-<para>
-The C language has an <function>ungetc()</function> function, just for that
-purpose. Is there a quick way to implement it in our code?
-I would like you to scroll back up and take a look at the
-<function>getchar</function> procedure and see if you can find a nice and
-fast solution before reading the next paragraph. Then come back
-here and see my own solution.
-</para>
+ <para>One possible solution is to store it in a character
+ variable, then set a flag. We can modify
+ <function>getchar</function> to check the flag, and if it is
+ set, fetch the byte from that variable instead of the input
+ buffer, and reset the flag. But, of course, that slows us
+ down.</para>
-<para>
-The key to returning a character back to the stream is in how
-we are getting the characters to start with:
-</para>
+ <para>The C language has an <function>ungetc()</function>
+ function, just for that purpose. Is there a quick way to
+ implement it in our code? I would like you to scroll back
+ up and take a look at the <function>getchar</function>
+ procedure and see if you can find a nice and fast solution
+ before reading the next paragraph. Then come back here and
+ see my own solution.</para>
-<para>
-First we check if the buffer is empty by testing the value
-of <varname role="register">EBX</varname>. If it is zero, we call the
-<function>read</function> procedure.
-</para>
+ <para>The key to returning a character back to the stream is
+ in how we are getting the characters to start with:</para>
-<para>
-If we do have a character available, we use <function role="opcode">lodsb</function>, then
-decrease the value of <varname role="register">EBX</varname>. The <function role="opcode">lodsb</function>
-instruction is effectively identical to:
-</para>
+ <para>First we check if the buffer is empty by testing the
+ value of <varname role="register">EBX</varname>. If it is
+ zero, we call the <function>read</function>
+ procedure.</para>
-<programlisting>
- mov al, [esi]
- inc esi
-</programlisting>
+ <para>If we do have a character available, we use <function
+ role="opcode">lodsb</function>, then decrease the value of
+ <varname role="register">EBX</varname>. The <function
+ role="opcode">lodsb</function> instruction is effectively
+ identical to:</para>
-<para>
-The byte we have fetched remains in the buffer until the next
-time <function>read</function> is called. We do not know when that happens,
-but we do know it will not happen until the next call to
-<function>getchar</function>. Hence, to "return" the last-read byte back
-to the stream, all we have to do is decrease the value of
-<varname role="register">ESI</varname> and increase the value of <varname role="register">EBX</varname>:
-</para>
+ <programlisting>mov al, [esi]
+ inc esi</programlisting>
-<programlisting>
-ungetc:
+ <para>The byte we have fetched remains in the buffer until the
+ next time <function>read</function> is called. We do not know
+ when that happens, but we do know it will not happen until the
+ next call to <function>getchar</function>. Hence, to "return"
+ the last-read byte back to the stream, all we have to do is
+ decrease the value of <varname role="register">ESI</varname>
+ and increase the value of <varname
+ role="register">EBX</varname>:</para>
+
+ <programlisting>ungetc:
dec esi
inc ebx
- ret
-</programlisting>
+ ret</programlisting>
-<para>
-But, be careful! We are perfectly safe doing this if our look-ahead
-is at most one character at a time. If we are examining more than
-one upcoming character and call <function>ungetc</function> several times
-in a row, it will work most of the time, but not all the time
-(and will be tough to debug). Why?
-</para>
+ <para>But, be careful! We are perfectly safe doing this if our
+ look-ahead is at most one character at a time. If we are
+ examining more than one upcoming character and call
+ <function>ungetc</function> several times in a row, it will
+ work most of the time, but not all the time (and will be tough
+ to debug). Why?</para>
-<para>
-Because as long as <function>getchar</function> does not have to call
-<function>read</function>, all of the pre-read bytes are still in the buffer,
-and our <function>ungetc</function> works without a glitch. But the moment
-<function>getchar</function> calls <function>read</function>,
-the contents of the buffer change.
-</para>
+ <para>Because as long as <function>getchar</function> does not
+ have to call <function>read</function>, all of the pre-read
+ bytes are still in the buffer, and our
+ <function>ungetc</function> works without a glitch. But the
+ moment <function>getchar</function> calls
+ <function>read</function>, the contents of the buffer
+ change.</para>
-<para>
-We can always rely on <function>ungetc</function> working properly on the last
-character we have read with <function>getchar</function>, but not on anything
-we have read before that.
-</para>
+ <para>We can always rely on <function>ungetc</function> working
+ properly on the last character we have read with
+ <function>getchar</function>, but not on anything we have read
+ before that.</para>
-<para>
-If your program reads more than one byte ahead, you have at least
-two choices:
-</para>
+ <para>If your program reads more than one byte ahead, you have
+ at least two choices:</para>
-<para>
-If possible, modify the program so it only reads one byte ahead.
-This is the simplest solution.
-</para>
+ <para>If possible, modify the program so it only reads one byte
+ ahead. This is the simplest solution.</para>
-<para>
-If that option is not available, first of all determine the maximum
-number of characters your program needs to return to the input
-stream at one time. Increase that number slightly, just to be
-sure, preferably to a multiple of 16—so it aligns nicely.
-Then modify the <varname>.bss</varname> section of your code, and create
-a small "spare" buffer right before your input buffer,
-something like this:
-</para>
+ <para>If that option is not available, first of all determine
+ the maximum number of characters your program needs to return
+ to the input stream at one time. Increase that number
+ slightly, just to be sure, preferably to a multiple of
+ 16—so it aligns nicely. Then modify the
+ <varname>.bss</varname> section of your code, and create a
+ small "spare" buffer right before your input buffer, something
+ like this:</para>
-<programlisting>
-section .bss
+ <programlisting>section .bss
resb 16 ; or whatever the value you came up with
ibuffer resb BUFSIZE
-obuffer resb BUFSIZE
-</programlisting>
+obuffer resb BUFSIZE</programlisting>
-<para>
-You also need to modify your <function>ungetc</function> to pass the value
-of the byte to unget in <varname role="register">AL</varname>:
-</para>
+ <para>You also need to modify your <function>ungetc</function>
+ to pass the value of the byte to unget in <varname
+ role="register">AL</varname>:</para>
-<programlisting>
-ungetc:
+ <programlisting>ungetc:
dec esi
inc ebx
mov [esi], al
- ret
-</programlisting>
+ ret</programlisting>
-<para>
-With this modification, you can call <function>ungetc</function>
-up to 17 times in a row safely (the first call will still
-be within the buffer, the remaining 16 may be either within
-the buffer or within the "spare").
-</para>
+ <para>With this modification, you can call
+ <function>ungetc</function> up to 17 times in a row safely
+ (the first call will still be within the buffer, the remaining
+ 16 may be either within the buffer or within the
+ "spare").</para>
+ </sect2>
+ </sect1>
-</sect2>
+ <sect1 xml:id="x86-command-line">
+ <title>Command Line Arguments</title>
-</sect1>
+ <para>Our <application>hex</application> program will be more
+ useful if it can read the names of an input and output file from
+ its command line, i.e., if it can process the command line
+ arguments. But... Where are they?</para>
-<sect1 xml:id="x86-command-line"><title>Command Line Arguments</title>
+ <para>Before a &unix; system starts a program, it <function
+ role="opcode">push</function>es some data on the stack, then
+ jumps at the <varname>_start</varname> label of the program.
+ Yes, I said jumps, not calls. That means the data can be
+ accessed by reading <varname>[esp+offset]</varname>, or by
+ simply <function role="opcode">pop</function>ping it.</para>
-<para>
-Our <application>hex</application> program will be more useful if it can
-read the names of an input and output file from its command
-line, i.e., if it can process the command line arguments.
-But... Where are they?
-</para>
+ <para>The value at the top of the stack contains the number of
+ command line arguments. It is traditionally called
+ <varname>argc</varname>, for "argument count."</para>
-<para>
-Before a &unix; system starts a program, it <function role="opcode">push</function>es some
-data on the stack, then jumps at the <varname>_start</varname>
-label of the program. Yes, I said jumps, not calls. That means the
-data can be accessed by reading <varname>[esp+offset]</varname>,
-or by simply <function role="opcode">pop</function>ping it.
-</para>
+ <para>Command line arguments follow next, all
+ <varname>argc</varname> of them. These are typically referred
+ to as <varname>argv</varname>, for "argument value(s)." That
+ is, we get <varname>argv[0]</varname>,
+ <varname>argv[1]</varname>, <varname>...</varname>,
+ <varname>argv[argc-1]</varname>. These are not the actual
+ arguments, but pointers to arguments, i.e., memory addresses of
+ the actual arguments. The arguments themselves are
+ NUL-terminated character strings.</para>
-<para>
-The value at the top of the stack contains the number of
-command line arguments. It is traditionally called
-<varname>argc</varname>, for "argument count."
-</para>
+ <para>The <varname>argv</varname> list is followed by a NULL
+ pointer, which is simply a <constant>0</constant>. There is
+ more, but this is enough for our purposes right now.</para>
-<para>
-Command line arguments follow next, all <varname>argc</varname> of them.
-These are typically referred to as <varname>argv</varname>, for
-"argument value(s)." That is, we get <varname>argv[0]</varname>,
-<varname>argv[1]</varname>, <varname>...</varname>,
-<varname>argv[argc-1]</varname>. These are not the actual
-arguments, but pointers to arguments, i.e., memory addresses of
-the actual arguments. The arguments themselves are
-NUL-terminated character strings.
-</para>
+ <note>
+ <para>If you have come from the <acronym>&ms-dos;</acronym>
+ programming environment, the main difference is that each
+ argument is in a separate string. The second difference is
+ that there is no practical limit on how many arguments there
+ can be.</para>
+ </note>
-<para>
-The <varname>argv</varname> list is followed by a NULL pointer,
-which is simply a <constant>0</constant>. There is more, but this is
-enough for our purposes right now.
-</para>
+ <para>Armed with this knowledge, we are almost ready for the next
+ version of <filename>hex.asm</filename>. First, however, we
+ need to add a few lines to
+ <filename>system.inc</filename>:</para>
-<note>
-<para>
-If you have come from the <acronym>&ms-dos;</acronym> programming
-environment, the main difference is that each argument is in
-a separate string. The second difference is that there is no
-practical limit on how many arguments there can be.
-</para>
-</note>
+ <para>First, we need to add two new entries to our list of system
+ call numbers:</para>
-<para>
-Armed with this knowledge, we are almost ready for the next
-version of <filename>hex.asm</filename>. First, however, we need to
-add a few lines to <filename>system.inc</filename>:
-</para>
+ <programlisting>%define SYS_open 5
+%define SYS_close 6</programlisting>
-<para>
-First, we need to add two new entries to our list of system
-call numbers:
-</para>
+ <para>Then we add two new macros at the end of the file:</para>
-<programlisting>
-%define SYS_open 5
-%define SYS_close 6
-</programlisting>
-
-<para>
-Then we add two new macros at the end of the file:
-</para>
-
-<programlisting>
-%macro sys.open 0
+ <programlisting>%macro sys.open 0
system SYS_open
%endmacro
%macro sys.close 0
system SYS_close
-%endmacro
-</programlisting>
+%endmacro</programlisting>
-<para>
-Here, then, is our modified source code:
-</para>
+ <para>Here, then, is our modified source code:</para>
-<programlisting>
-%include 'system.inc'
+ <programlisting>%include 'system.inc'
%define BUFSIZE 2048
@@ -1653,234 +1553,192 @@ write:
*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
More information about the svn-doc-all
mailing list