summaryrefslogtreecommitdiff
path: root/docs/devel/parsing.xml
blob: 8d929617f5aa6664a4752340896eb8363e22c9e2 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
<chapter id="parsing">
<chapterinfo>
	<author>
		<firstname>Chris</firstname><surname>Hertel</surname>
	</author>
	<pubdate>November 1997</pubdate>
</chapterinfo>

<title>The smb.conf file</title>

<sect1>
<title>Lexical Analysis</title>

<para>
Basically, the file is processed on a line by line basis.  There are
four types of lines that are recognized by the lexical analyzer
(params.c):
</para>

<orderedlist>
<listitem><para>
Blank lines - Lines containing only whitespace.
</para></listitem>
<listitem><para>
Comment lines - Lines beginning with either a semi-colon or a
pound sign (';' or '#').
</para></listitem>
<listitem><para>
Section header lines - Lines beginning with an open square bracket ('[').
</para></listitem>
<listitem><para>
Parameter lines - Lines beginning with any other character.
(The default line type.)
</para></listitem>
</orderedlist>

<para>
The first two are handled exclusively by the lexical analyzer, which
ignores them.  The latter two line types are scanned for
</para>

<orderedlist>
<listitem><para>
  - Section names
</para></listitem>
<listitem><para>
  - Parameter names
</para></listitem>
<listitem><para>
  - Parameter values
</para></listitem>
</orderedlist>

<para>
These are the only tokens passed to the parameter loader
(loadparm.c).  Parameter names and values are divided from one
another by an equal sign: '='.
</para>

<sect2>
<title>Handling of Whitespace</title>

<para>
Whitespace is defined as all characters recognized by the isspace()
function (see ctype(3C)) except for the newline character ('\n')
The newline is excluded because it identifies the end of the line.
</para>

<orderedlist>
<listitem><para>
The lexical analyzer scans past white space at the beginning of a line.
</para></listitem>

<listitem><para>
Section and parameter names may contain internal white space.  All
whitespace within a name is compressed to a single space character. 
</para></listitem>

<listitem><para>
Internal whitespace within a parameter value is kept verbatim with 
the exception of carriage return characters ('\r'), all of which
are removed.
</para></listitem>

<listitem><para>
Leading and trailing whitespace is removed from names and values.
</para></listitem>

</orderedlist>

</sect2>

<sect2>
<title>Handling of Line Continuation</title>

<para>
Long section header and parameter lines may be extended across
multiple lines by use of the backslash character ('\\').  Line
continuation is ignored for blank and comment lines.
</para>

<para>
If the last (non-whitespace) character within a section header or on
a parameter line is a backslash, then the next line will be
(logically) concatonated with the current line by the lexical
analyzer.  For example:
</para>

<para><programlisting>
	param name = parameter value string \
	with line continuation.
</programlisting></para>

<para>Would be read as</para>

<para><programlisting>
    param name = parameter value string     with line continuation.
</programlisting></para>

<para>
Note that there are five spaces following the word 'string',
representing the one space between 'string' and '\\' in the top
line, plus the four preceeding the word 'with' in the second line.
(Yes, I'm counting the indentation.)
</para>

<para>
Line continuation characters are ignored on blank lines and at the end
of comments.  They are *only* recognized within section and parameter
lines.
</para>

</sect2>

<sect2>
<title>Line Continuation Quirks</title>

<para>Note the following example:</para>

<para><programlisting>
	param name = parameter value string \
    \
    with line continuation.
</programlisting></para>

<para>
The middle line is *not* parsed as a blank line because it is first
concatonated with the top line.  The result is
</para>

<para><programlisting>
param name = parameter value string         with line continuation.
</programlisting></para>

<para>The same is true for comment lines.</para>

<para><programlisting>
	param name = parameter value string \
	; comment \
    with a comment.
</programlisting></para>

<para>This becomes:</para>

<para><programlisting>
param name = parameter value string     ; comment     with a comment.
</programlisting></para>

<para>
On a section header line, the closing bracket (']') is considered a
terminating character, and the rest of the line is ignored.  The lines
</para>

<para><programlisting>
	[ section   name ] garbage \
    param  name  = value
</programlisting></para>

<para>are read as</para>

<para><programlisting>
	[section name]
    param name = value
</programlisting></para>

</sect2>
</sect1>

<sect1>
<title>Syntax</title>

<para>The syntax of the smb.conf file is as follows:</para>

<para><programlisting>
  &lt;file&gt;            :==  { &lt;section&gt; } EOF
  &lt;section&gt;         :==  &lt;section header&gt; { &lt;parameter line&gt; }
  &lt;section header&gt;  :==  '[' NAME ']'
  &lt;parameter line&gt;  :==  NAME '=' VALUE NL
</programlisting></para>

<para>Basically, this means that</para>

<orderedlist>
<listitem><para>
	a file is made up of zero or more sections, and is terminated by
	an EOF (we knew that).
</para></listitem>

<listitem><para>
	A section is made up of a section header followed by zero or more
	parameter lines.
</para></listitem>

<listitem><para>
	A section header is identified by an opening bracket and
	terminated by the closing bracket.  The enclosed NAME identifies
	the section.
</para></listitem>

<listitem><para>
	A parameter line is divided into a NAME and a VALUE.  The *first*
	equal sign on the line separates the NAME from the VALUE.  The
	VALUE is terminated by a newline character (NL = '\n').
</para></listitem>

</orderedlist>

<sect2>
<title>About params.c</title>

<para>
The parsing of the config file is a bit unusual if you are used to
lex, yacc, bison, etc.  Both lexical analysis (scanning) and parsing
are performed by params.c.  Values are loaded via callbacks to
loadparm.c.
</para>
</sect2>
</sect1>
</chapter>