[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter describes the concepts of package X-Symbol. It contains quite a few forward references to feature which are based on these concepts, such as 4. X-Symbol's Input Methods, and 5. Features of Package X-Symbol.
3.1 Token Language | What does a X-Symbol character represent. | |
3.2 Conversion: Decoding and Encoding | Decoding tokens, encoding characters. | |
3.3 Minor Mode | How to control the behavior of X-Symbol. | |
3.4 Poor Man's Mule: Running Under XEmacs/no-Mule | Running X-Symbol under XEmacs/no-Mule. | |
3.5 The Role of font-lock | Why does X-Symbol need font-lock . | |
3.6 Character Group and Token Classes | Character group and token classes. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As mentioned in the overview, "X-Symbol Characters" in the buffer are
represented by "tokens" in the file. The correspondence between these
is determined by the token language which is in close relation to
the major mode of the current buffer. E.g., character alpha
stands for \alpha
in LaTeX buffers.
For details of predefined token languages "TeX macro" (tex
),
"SGML entity" (sgml
), "BibTeX macro" (bib
), and
"TeXinfo command" (texi
), see 6. Supported Token Languages.
The token language determines the conversion between X-Symbol characters and tokens (see section 3.2 Conversion: Decoding and Encoding), the input methods (see section 4. X-Symbol's Input Methods), and various other features (see section 5. Features of Package X-Symbol).
The token language is defined by the following buffer-local variable:
x-symbol-language
%% Local Variables: %% x-symbol-language: tex %% End: |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As mentioned, X-Symbol characters in the buffer are represented by tokens in the file. Thus, we need some conversion from tokens to characters, called decoding, and some conversion from characters to tokens, called encoding.
We have the additional problem that some characters are not only represented by tokens, but also via some 8bit character encoding.
Package X-Symbol supports the following 8bit character encodings:
Latin-1 (iso-8859-1
), Latin-2 (iso-8859-2
), Latin-3
(iso-8859-3
), Latin-5 (iso-8859-9
), and Latin-9
(iso-8859-15
). It currently supports less encodings with XEmacs
on Windows (see section 2.1 Requirements).
3.2.1 Normal File and Default Encoding | ||
3.2.2 File Coding of 8bit Characters | Specific encoding of a file. | |
3.2.3 Store or Encode 8bit Characters | Do you want to store 8bit characters? | |
3.2.4 Unique Decoding | Restrict decoding to avoid normalization? | |
3.2.5 Conversion Commands | Interactive encoding and decoding. | |
3.2.6 Copy & Paste with Conversion | Copy & paste with conversion. | |
3.2.7 Character Aliases | Different charsets include the same chars. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As mentioned, some characters have a 8bit file encoding, and X-Symbol needs to know which 8bit file encoding you use normally when visiting a file and saving a buffer.
With Mule support, Emacs/XEmacs can recognize the normal file encoding, also called a coding system (see section `Recognize Coding' in XEmacs User's Manual).
Without Mule support, XEmacs can usually only support 8bit characters of one encoding; this encoding corresponds to the charset/registry of your default font. Here, the normal file encoding is the default encoding:
x-symbol-default-coding
nil
. The variable must be set before
X-Symbol has been initialized. See section 2.4 Make XEmacs Initialize X-Symbol During Startup.
locale
, or to be
more exact:
locale -ck code_set_name charmap |
nil
is the same as
iso-8859-1
.
With Mule support, you get a warning if the command lists a supported
encoding which is different from the encoding deduced from the Mule
language environment. Value nil
makes sure that X-Symbol file
encoding detection (see section 3.2.2 File Coding of 8bit Characters) only works if Emacs has
detected the same encoding; it works like iso-8859-1
otherwise.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
X-Symbol can use a different encoding for single buffers/files, even if you use X-Symbol on XEmacs without Mule support. To do so, set the following buffer-local variable:
x-symbol-coding
nil
represents the normal file encoding (see section 3.2.1 Normal File and Default Encoding).
With Mule support, any value other than nil
is considered invalid
if the normal file encoding is neither the same as this value nor the
same as the default encoding. I.e., if your default encoding is
nil
, X-Symbol's file encoding detection never takes precedence
over Emacs' one, i.e., the normal file encoding.
You can set this variable in the "local variables list" near the end of the file (see section `File Variables' in XEmacs User's Manual), e.g.:
<!-- Local Variables: --> <!-- x-symbol-coding: iso-8859-2 --> <!-- End: --> |
x-symbol-auto-coding-search-limit
tex
)) or `<meta ...
charset=...>' (see section 6.3 Token Language "SGML entity" (sgml
)) in the first 10000 characters.
nil
, the normal file encoding is
unsupported, and the variable x-symbol-coding
is not specified.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
You can specify that 8bit characters (according to the coding in your file, see 3.2.2 File Coding of 8bit Characters), are not encoded to tokens (when saving a file), by setting the following buffer-local variable:
x-symbol-8bits
You can set this variable in the "local variables list" near the end of the file (see section `File Variables' in XEmacs User's Manual), e.g.:
%% Local Variables: %% x-symbol-8bits: t %% End: |
x-symbol-coding
, or searching in the file for 8bit
characters:
x-symbol-auto-8bit-search-limit
x-symbol-8bits
accordingly. Then, a non-nil
value also
implies unique decoding (see section 3.2.4 Unique Decoding).
While the variable x-symbol-8bits
usually only influences the
encoding, it also influences the decoding if you choose to decode
uniquely (see section 3.2.4 Unique Decoding).
Setting variable x-symbol-8bits
to nil
does not
necessarily mean that the file will not contain 8bit characters: the
characters might have no token representation in the current token
language (see section 6.5 Token Language "TeXinfo command" (texi
)), or they are glyphs for ununsed code
points in the Latin-3 charset. In both cases, it is unlikely that you
have inserted these invalid characters via X-Symbol's input methods
(see section 4.1 Common Behavior of All Input Methods), you have probably copied them into
the current buffer.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Token languages might define more than one token representing the same
character. When decoding and encoding these tokens, they will be
normalized to one form, the canonical representation. E.g.,
with language tex
, visiting a file with tokens \neq
and
\ne
converts both tokens to character lessequal
, saving
the buffer stores the character as token \neq
in both
occurrences.
It can also happen that a file contains both a 8bit character and a token which would be converted to exactly that character. When saving the file, both characters are either not encoded, or both are encoded to the same token.
Normally, this is no problem. But if you redefine standard TeX macros, it certainly could be the case (see section 6.2.3 Problems with TeX Macros)! For this reason, package X-Symbol provides the following buffer-local variable:
x-symbol-unique
x-symbol-8bits
is non-nil
(see section 3.2.3 Store or Encode 8bit Characters), do not decode tokens which would be decoded to 8bit characters
(according to the coding in your file, see 3.2.2 File Coding of 8bit Characters).
You can set this variable in the "local variables list" near the end
of the file (see section `File Variables' in XEmacs User's Manual),
e.g., together with a setting for x-symbol-8bits
:
%% Local Variables: %% x-symbol-8bits: t %% x-symbol-unique: t %% End: |
t
if X-Symbol mode is not automatically turned on.
If the file encoding is invalid (see section 3.2.2 File Coding of 8bit Characters) and
x-symbol-8bits
is non-nil
(see section 3.2.3 Store or Encode 8bit Characters), X-Symbol always uses unique decoding (see section 3.2.4 Unique Decoding).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
First the good news: most of the time, the necessary conversions are performed automatically when you would expect them to be performed:
Nevertheless, you might want to perform the conversions explicitly in some situations by using one of the following commands (also to be found in the menu):
x-symbol-8bits
is relative to the file coding, see 3.2.3 Store or Encode 8bit Characters.
All commands work on the region if it is active, or the (narrowed part of the) buffer if no region is active.
If the file coding is the same as the default coding, the variants with and without recoding (see section 3.2.2 File Coding of 8bit Characters) do the same. The variants with recodings are the ones used when doing the conversion automatically. The variants without recodings are the ones used when using the special Copy & Paste commands presented in the next subsection.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
You probably use X-Symbol, because you want to produce some non-ASCII characters in your final document, but you are not really interested what kind of token you would need to write. (After all, you do not use a hex editor to produce documents using some non-ASCII encoding in the file, since you are not interested in the byte sequence of individual characters.)
Consequently, all editing operations really work on characters, not on
the corresponding tokens for the token language of the current buffer.
This includes copying and pasting: if you copy the character
plusminus
from a LaTeX buffer to a HTML buffer, you really copy
that character and not the three characters of the TeX macro \pm
.
If you copy text to a buffer where X-Symbol is not enabled, like a mail buffer, that is probably not what you want. Similarly, you would probably like to see the X-Symbol characters for tokens in a text which you have copied from such a buffer. Therefore, X-Symbol provides the following commands (also to be found in the menu):
kill-ring
with all X-Symbol characters
encoded like by M-x x-symbol-encode, i.e., without recoding.
kill-ring
and decode the inserted
text like M-x x-symbol-decode, i.e., without recoding.
You could get the same result with the usual copy & paste commands and the conversion commands from the previous section (see section 3.2.5 Conversion Commands), but this would clutter the undo information of the current buffer and would require an additional undo operation for the copy.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A character alias or char alias is a character which is also
a character in a font with another registry, e.g., adiaeresis
is
defined in all supported Latin fonts. Emacs distinguish between these
five characters. In package X-Symbol, one of them, with
x-symbol-default-coding
(see section 3.2.1 Normal File and Default Encoding if possible, is
supported by the input methods, the other ones are char aliases to the
supported one.
The reason is that it would be confusing for the user to choose among
different adiaeresis
es and that there are neither different
adiaeresis
es in Unicode nor in the token representations of
languages tex
and sgml
.
8bit characters in files with a file coding x-symbol-coding
other
than x-symbol-default-coding
are converted to the "normal"
form. E.g., if you have a Latin-1 font by default, the
adiaeresis
in a Latin-2 encoded file is a Latin-1
adiaeresis
in the buffer. When saving the buffer, its is again
the right 8bit character in the Latin-2 encoded file.
Thus, in normal cases, buffers do not have char aliases. In Emacs with Mule support, this is only possible if you copy characters from buffers with characters considered as char aliases by package X-Symbol, e.g., from the Mule file `european.el'. In XEmacs without Mule support, this is only possible if you use commands like C-q 2 3 4.
If you have char aliases in the current buffer, you might want to use (it is not really necessary, just when searching for characters):
A single char alias before point can be resolved by command
x-symbol-modify-key
and x-symbol-rotate-key
, see
4.7 Input Method Context: Replace Char Sequence.
The XEmacs package latin-unity
provides a command to "remap"
characters to one character set (if possible). X-Symbol's unaliasing
can be seen as remap operations to a fixed sequence of character sets.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
X-Symbol is a minor mode (see section `Minor Modes' in XEmacs User's Manual) which enables the features mentioned in this manual:
With the default installation, X-Symbol mode is automatically turned on when it is appropriate to do so (see below for details). You can control it for individually by the following command:
By default, X-Symbol mode is disabled in special major-modes visiting a
file, e.g., vm-mode
(see section 8.4.12 How to Use X-Symbol with Gnus or VM). Use a prefix
argument to be asked whether to turn in on anyway.
Turning X-Symbol mode on requires that you have a valid token language for the current buffer. Since turning X-Symbol mode on also decodes tokens, it is also useful to set the variables which control the conversion (see section 3.2 Conversion: Decoding and Encoding).
Since people usually do not want to write some Emacs Lisp functions to do some customizations, X-Symbol provides the following variables which induce X-Symbol to set the necessary buffer-local variables when X-Symbol is turned on:
x-symbol-auto-style-alist
x-symbol-token-language
(see section 3.1 Token Language), indicated in
the modeline, e.g. `tex',
x-symbol-mode
, i.e., whether it is appropriate to turn on
X-Symbol mode automatically,
x-symbol-coding
(see section 3.2.2 File Coding of 8bit Characters), indicated in the modeline
if different from the default coding, e.g. `-l2' for Latin-2,
x-symbol-8bits
(see section 3.2.3 Store or Encode 8bit Characters), indicated in
the modeline by `8',
x-symbol-unique
(see section 3.2.4 Unique Decoding), indicated in
the modeline by `*',
x-symbol-subscripts
(see section 5.1 Super- and Subscripts), indicated in
the modeline by `s',
x-symbol-image
(see section 5.2 Images at the end of Image Insertion Commands), indicated in the modeline by
`i',
x-symbol-lang-modes
x-symbol-lang-auto-style
x-symbol-mode
,
x-symbol-coding
, x-symbol-8bits
, x-symbol-unique
,
x-symbol-subscripts
, and x-symbol-image
if not already
buffer-local.
x-symbol-auto-mode-suffixes
x-symbol-modeline-state-list
The menu might also include individual entries for a token language (see section 6.2.1 Basics of Language "TeX macro"):
x-symbol-lang-extra-menu-items
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Using XEmacs/no-Mule normally means that you are restricted to use not more than 256 different characters in your documents.
Package X-Symbol provides a lot more characters which can also be used with XEmacs/no-Mule. Internally, all X-Symbol characters except the ones of your default font (see section 3.2.1 Normal File and Default Encoding) are represented by two characters, see 7.1 Internal Representation of X-Symbol Characters.
This can lead to a lot of problems, which are resolved by the following methods (some annoyances remain, see section 8.1 Problems under XEmacs/no-Mule) when X-Symbol mode is turned on (see section 3.3 Minor Mode):
font-lock
is used to display these two-character
sequences with the correct fonts. The potential problem lies in the
set-up of the corresponding font-lock keywords, see 3.5 The Role of font-lock
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
font-lock
Package X-Symbol uses package font-lock
to display super- and
subscripts (see section 5.1 Super- and Subscripts) and to display its special
characters under XEmacs/no-Mule (see section 3.4 Poor Man's Mule: Running Under XEmacs/no-Mule). Thus, you
should enable font-lock
in buffers where you want to use X-Symbol
(it is by default). See section 2.6.2 Syntax Highlighting Packages (font-lock
and add-ons).
When X-Symbol mode is turned on, it automatically adds the necessary
font-lock keywords to the buffer-local value of
font-lock-keywords
and all font-lock keywords which are commonly
used with the current token language.
Setting all font-lock keywords is important since font-lock
might
not yet been turned on or since you might want to change
font-lock
s decoration of the current buffer after X-Symbol has
been turned on.
Please note that switching the mode by typing M-x latex-mode
does not set the LaTeX's font-lock keywords! They are set at
the end of C-x C-f. If you switch the mode, turn on
font-lock
by yourself.
Independently from package X-Symbol, the following command might be useful in some situations:
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Each X-Symbol character belongs to a character group, e.g.,
natnums
belongs to setsymbol
. A character group should
consists of similar characters where "similar" means similar meaning,
not similar appearance. Two characters which have nearly the same
appearance, should be in the same group, though. The group determines:
The character group is independent from any token language, but is probably somewhat related to some of its token classes. For each token language, each character is assigned to a list of token classes, which can be used for the following:
The token classes for individual token languages are explained in the corresponding sections of 6. Supported Token Languages:
x-symbol-lang-header-groups-alist
x-symbol-lang-class-alist
x-symbol-lang-class-face-alist
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |