tar
and POSIX tar
GNU tar
was based on an early draft of the POSIX 1003.1
ustar
standard. GNU extensions to tar
, such as the
support for file names longer than 100 characters, use portions of the
tar
header record which were specified in that POSIX draft as
unused. Subsequent changes in POSIX have allocated the same parts of
the header record for other purposes. As a result, GNU tar
is
incompatible with the current POSIX spec, and with tar
programs
that follow it.
We plan to reimplement these GNU extensions in a new way which is
upward compatible with the latest POSIX tar
format, but we
don't know when this will be done.
In the mean time, there is simply no telling what might happen if you
read a GNU tar
archive, which uses the GNU extensions, using
some other tar
program. So if you want to read the archive
with another tar
program, be sure to write it using the
`--old-archive' option (`-o').
@FIXME{is there a way to tell which flavor of tar was used to write a particular archive before you try to read it?}
Traditionally, old tar
s have a limit of 100 characters. GNU
tar
attempted two different approaches to overcome this limit,
using and extending a format specified by a draft of some P1003.1.
The first way was not that successful, and involved `@MaNgLeD@'
file names, or such; while a second approach used `././@LongLink'
and other tricks, yielding better success. In theory, GNU tar
should be able to handle file names of practically unlimited length.
So, if GNU tar
fails to dump and retrieve files having more
than 100 characters, then there is a bug in GNU tar
, indeed.
But, being strictly POSIX, the limit was still 100 characters.
For various other purposes, GNU tar
used areas left unassigned
in the POSIX draft. POSIX later revised P1003.1 ustar
format by
assigning previously unused header fields, in such a way that the upper
limit for file name length was raised to 256 characters. However, the
actual POSIX limit oscillates between 100 and 256, depending on the
precise location of slashes in full file name (this is rather ugly).
Since GNU tar
use the same fields for quite other purposes,
it became incompatible with the latest POSIX standards.
For longer or non-fitting file names, we plan to use yet another set
of GNU extensions, but this time, complying with the provisions POSIX
offers for extending the format, rather than conflicting with it.
Whenever an archive uses old GNU tar
extension format or POSIX
extensions, would it be for very long file names or other specialities,
this archive becomes non-portable to other tar
implementations.
In fact, anything can happen. The most forgiving tar
s will
merely unpack the file using a wrong name, and maybe create another
file named something like `@LongName', with the true file name
in it. tar
s not protecting themselves may segment violate!
Compatibility concerns make all this thing more difficult, as we
will have to support all these things together, for a while.
GNU tar
should be able to produce and read true POSIX format
files, while being able to detect old GNU tar
formats, besides
old V7 format, and process them conveniently. It would take years
before this whole area stabilizes...
There are plans to raise this 100 limit to 256, and yet produce POSIX
conformant archives. Past 256, I do not know yet if GNU tar
will go non-POSIX again, or merely refuse to archive the file.
There are plans so GNU tar
support more fully the latest POSIX
format, while being able to read old V7 format, GNU (semi-POSIX plus
extension), as well as full POSIX. One may ask if there is part of
the POSIX format that we still cannot support. This simple question
has a complex answer. Maybe that, on intimate look, some strong
limitations will pop up, but until now, nothing sounds too difficult
(but see below). I only have these few pages of POSIX telling about
`Extended tar Format' (P1003.1-1990 -- section 10.1.1), and there are
references to other parts of the standard I do not have, which should
normally enforce limitations on stored file names (I suspect things
like fixing what / and NUL means). There are also
some points which the standard does not make clear, Existing practice
will then drive what I should do.
POSIX mandates that, when a file name cannot fit within 100 to
256 characters (the variance comes from the fact a / is
ideally needed as the 156'th character), or a link name cannot
fit within 100 characters, a warning should be issued and the file
not be stored. Unless some --posix option is given
(or POSIXLY_CORRECT
is set), I suspect that GNU tar
should disobey this specification, and automatically switch to using
GNU extensions to overcome file name or link name length limitations.
There is a problem, however, which I did not intimately studied yet.
Given a truly POSIX archive with names having more than 100 characters,
I guess that GNU tar
up to 1.11.8 will process it as if it were an
old V7 archive, and be fooled by some fields which are coded differently.
So, the question is to decide if the next generation of GNU tar
should produce POSIX format by default, whenever possible, producing
archives older versions of GNU tar
might not be able to read
correctly. I fear that we will have to suffer such a choice one of these
days, if we want GNU tar
to go closer to POSIX. We can rush it.
Another possibility is to produce the current GNU tar
format
by default for a few years, but have GNU tar
versions from some
1.POSIX and up able to recognize all three formats, and let older
GNU tar
fade out slowly. Then, we could switch to producing POSIX
format by default, with not much harm to those still having (very old at
that time) GNU tar
versions prior to 1.POSIX.
POSIX format cannot represent very long names, volume headers,
splitting of files in multi-volumes, sparse files, and incremental
dumps; these would be all disallowed if --posix or
POSIXLY_CORRECT
. Otherwise, if tar
is given long
names, or `-[VMSgG]', then it should automatically go non-POSIX.
I think this is easily granted without much discussion.
Another point is that only mtime
is stored in POSIX
archives, while GNU tar
currently also store atime
and ctime
. If we want GNU tar
to go closer to POSIX,
my choice would be to drop atime
and ctime
support on
average. On the other hand, I perceive that full dumps or incremental
dumps need atime
and ctime
support, so for those special
applications, POSIX has to be avoided altogether.
A few users requested that --sparse (-S) be always active by
default, I think that before replying to them, we have to decide
if we want GNU tar
to go closer to POSIX on average, while
producing files. My choice would be to go closer to POSIX in the
long run. Besides possible double reading, I do not see any point
of not trying to save files as sparse when creating archives which
are neither POSIX nor old-V7, so the actual --sparse (-S) would
become selected by default when producing such archives, whatever
the reason is. So, --sparse (-S) alone might be redefined to force
GNU-format archives, and recover its previous meaning from this fact.
GNU-format as it exists now can easily fool other POSIX tar
,
as it uses fields which POSIX considers to be part of the file name
prefix. I wonder if it would not be a good idea, in the long run,
to try changing GNU-format so any added field (like ctime
,
atime
, file offset in subsequent volumes, or sparse file
descriptions) be wholly and always pushed into an extension block,
instead of using space in the POSIX header block. I could manage
to do that portably between future GNU tar
s. So other POSIX
tar
s might be at least able to provide kind of correct listings
for the archives produced by GNU tar
, if not able to process
them otherwise.
Using these projected extensions might induce older tar
s to fail.
We would use the same approach as for POSIX. I'll put out a tar
capable of reading POSIXier, yet extended archives, but will not produce
this format by default, in GNU mode. In a few years, when newer GNU
tar
s will have flooded out tar
1.11.X and previous, we
could switch to producing POSIXier extended archives, with no real harm
to users, as almost all existing GNU tar
s will be ready to read
POSIXier format. In fact, I'll do both changes at the same time, in a
few years, and just prepare tar
for both changes, without effecting
them, from 1.POSIX. (Both changes: 1--using POSIX convention for
getting over 100 characters; 2--avoiding mangling POSIX headers for GNU
extensions, using only POSIX mandated extension techniques).
So, a future tar
will have a --posix
flag forcing the usage of truly POSIX headers, and so, producing
archives previous GNU tar
will not be able to read.
So, once pretest will announce that feature, it would be
particularly useful that users test how exchangeable will be archives
between GNU tar
with --posix and other POSIX tar
.
In a few years, when GNU tar
will produce POSIX headers by
default, --posix will have a strong meaning and will disallow
GNU extensions. But in the meantime, for a long while, --posix
in GNU tar will not disallow GNU extensions like --label=archive-label (-V archive-label),
--multi-volume (-M), --sparse (-S), or very long file or link names.
However, --posix with GNU extensions will use POSIX
headers with reserved-for-users extensions to headers, and I will be
curious to know how well or bad POSIX tar
s will react to these.
GNU tar
prior to 1.POSIX, and after 1.POSIX without
--posix, generates and checks `ustar ', with two
suffixed spaces. This is sufficient for older GNU tar
not to
recognize POSIX archives, and consequently, wrongly decide those archives
are in old V7 format. It is a useful bug for me, because GNU tar
has other POSIX incompatibilities, and I need to segregate GNU tar
semi-POSIX archives from truly POSIX archives, for GNU tar
should
be somewhat compatible with itself, while migrating closer to latest
POSIX standards. So, I'll be very careful about how and when I will do
the correction.
Go to the first, previous, next, last section, table of contents.