Discussion:
[ast-users] ksh93 double byte space handling
lijo george
8 years ago
Permalink
Hi,

The attached testscript has a leading double byte space separator before
the for loop closing "done" keyword. This fails with a syntax error while
parsing.

Is it a bug or is it expected behaviour?

I've tried it with ksh93u+ and ksh93v- versions on a Solaris setup.
bash and zsh also fails, hence I'm thinking it might not be a bug, but
could someone please confirm this.

Here's a sample output.

***@S11_3_SRU:~# echo $LANG
ja_JP.UTF-8
***@S11_3_SRU:~# cat space.ksh
#!/bin/ksh
for i in 1 2
do
echo $i
done # leading double byte space character
***@S11_3_SRU:~# od -xc space.ksh
0000000 2321 2f62 696e 2f6b 7368 0a66 6f72 2069
# ! / b i n / k s h \n f o r i
0000020 2069 6e20 3120 320a 646f 0a65 6368 6f20
i n 1 2 \n d o \n e c h o
0000040 2469 0ae3 8080 646f 6e65 0a00
$ i \n 343 200 200 d o n e \n
0000053
***@S11_3_SRU:~# ksh --version
version sh (AT&T Research) 93u+ 2012-08-01
***@S11_3_SRU:~# ksh space.ksh
space.ksh: syntax error at line 6: `for' unmatched
***@S11_3_SRU:~# ./ksh-2014
***@S11_3_SRU:~# echo ${.sh.version}
Version AIJMP 93v- 2014-12-24
***@S11_3_SRU:~# ./space.ksh
./space.ksh: syntax error at line 6: `for' unmatched
***@S11_3_SRU:~#

Thanks,
Lijo
Philippe Bergheaud
8 years ago
Permalink
...
You should remove the (invisible) character 0343 (0xe3), before the two
spaces.

Philippe
lijo george
8 years ago
Permalink
Thanks for the suggestion Philippe.
But I'm a bit confused though, Isn't "0xe3 0x80 0x80" the UTF-8
representation of the space character.


Thanks,
Lijo

On Tue, Apr 25, 2017 at 5:49 PM, Philippe Bergheaud <
...
Richard Hamilton
8 years ago
Permalink
I'm going to consider this _without_ looking at the ksh source, because
mortals will at most look at documentation (and because documentation
should be accurate enough that they shouldn't _have_ to look at source).

My very cursory reading of the man page* is a bit ambiguous whether that
should work:

A blank is a tab or a space. An identifier is a sequence of
letters,
digits, or underscores starting with a letter or underscore.
Identi-
fiers are used as components of variable names. A vname is a
sequence
of one or more identifiers separated by a . and optionally preceded
by
a .. Vnames are used as function and variable names. A word
is a
sequence of characters from the character set defined by the
current
locale, excluding non-quoted metacharacters.

"A blank is a tab or a space" is more restrictive than "A word is a
sequence of characters from the character set defined by the current
locale, excluding non-quoted meta characters". And if I try a vertical
tab, formfeed, or carriage return (all plain ASCII characters classified as
white space by isspace(3)) before "done", I get the same error. So it
looks like the more restrictive interpretation holds: only tabs and the
basic space character are acceptable in the code as white space. Of
course, anything should be ok in a quoted string (except whatever closes
the quotes); or rather, anything except a null byte, which does NOT work**
(ksh isn't perl - the latter goes out of its way to tolerate just about
anything).

However, I wouldn't do it, even if it should work, because that makes it
only work in an appropriate (UTF-8) locale; it would certainly be an error
regardless in C locale. If it were me, I would only use anything not
sensible in C locale, within a quoted string constant; one does NOT want
code that does nasty things depending on what locale is in use.

* ${.sh.version} on my Mac is Version AJM 93u+ 2012-08-01, which I gather
is reasonably current. :-)

** the following produces an interesting error:

0000000 # ! / b i n / k s h \n \n e c h
0000020 o " \0 t e s t i n g " \n
0000035
$ ./tryme.ksh
./tryme.ksh: syntax error at line 3: `zero byte' unexpected
...
lijo george
8 years ago
Permalink
So I guess the observed behaviour is not a bug but intended behaviour.

It's interesting that this used to work for the old ksh88 version, which
might have been due to less
complicated parsing mechanism.

Thanks,
Lijo
...
Loading...