Identify encoding

Character sets and encodings determine how letters and symbols are represented numerically on your computer. There are a great many different representations used in different countries and for different languages.

The GEDCOM 5.5 specification supports only ANSEL, UNICODE and ASCII, and the draft 5.5.1 specification adds UTF-8. GWintree supports all of these, and for convenience also ANSI, and converts data automatically between them as required.

If your GEDCOM file specifies an encoding other than one of these supported encodings, you will need to identify which of the available encodings should be used when the file is read. Those most likely to be encountered are probably MACINTOSH and IBMPC: in these cases, GWintree will make a selection for you, respectively 'MACINTOSH' and codepage 437, which should usually be correct.

Where a GEDCOM file specifies its encoding as ANSI, the data will be read automatically by GWintree. However, the term 'ANSI' may refer to any of a range of Windows codepages, and a file produced in Eastern Europe, for instance, may not display correctly in Western Europe or the U.S.A. In this case, again, you will need to identify which encoding should be used when the file is read. Similarly, MACINTOSH could refer to one of many different codepages.

When the Identify encoding dialog has been opened, either automatically or by choosing Encoding... from the Tools menu, the encoding reported in the GEDCOM file is shown on the left-hand side, and the combo box on the right-hand side lists all the available encodings provided on your system, as well as those provided by GWintree.

The encoding which you select takes effect when the file is loaded, and determines how the data is interpreted. If the file is subsequently saved, it must be saved using one of the supported encodings listed above: by default it will be saved using UTF-8.

Whichever encoding you select will not cause any change to the data in the file, but only the way it is represented, so long as the file is not saved. If you make the wrong selection, you can simply open the dialog again and choose another. However, clicking OK in the dialog will cause the file to be reloaded: if you have edited the file, any changes you have made will be lost; and if the file is saved and the wrong encoding has been selected, your data may be damaged.

If your file contains only the standard characters, letters, punctuation marks etc., used in English (which can be represented by ASCII codes 0 to 127), the representation of these is the same in many different encodings, including ANSEL, codepage 1252 or UTF-8, or codepage 437 or MACINTOSH; and choosing any of these will cause no change to your data. If your file contains accented characters, currency symbols (apart from the dollar), bullets, or characters from other scripts, such as Cyrillic, the correct encoding must be identified for these characters to be displayed correctly.