[tex-live] Better ways to find packages and documentation [was: texdoc in luatex]

Norbert Preining preining at logic.at
Wed Jul 4 14:13:50 CEST 2007


Hi Florent,

On Die, 03 Jul 2007, Florent Rougon wrote:
> I took a little bit of time to answer, because I wanted to look at the
> new TL infrastructure before (and also because I'm trying to have a life

Please send me your comments and suggestions for this, too (private or
here on list). We are still in the initial phase and everything is open.

> > puhhh, many thing are floating around. In fact it would be nice to
> > discuss all this stuff in person, which would make it much easier!
> 
> Do you want to visit Paris? :)

Yes. When? Do you have some logic/mathematics institute nearby so that I
can give a talk? That would make it a official journey ;-)

> When I read this texlive.tlpdb excerpt, it is not at all obvious (for a
> program) that foo-de.pdf has the attribute language='de'.

So the problem we/you try to solve is the additional tagging of single
documents with language tags? Right?

If we just for now ignore this problem and consider *only* CTAN package
tagging (isn't it what you proposed below?) then that could be ignored.

I would say for a start some system that allows me to search for
packages with some tags, and spits out all/some document files, without
any further sub-division (what kind of doc file etc) would already be a
HUGE step forward, probably enough for all kind of needs. People *can*
generally understand what a file is for, at leastafter the first 2
pages.

> > Unfortunately we don't have a back mapping built into TeX Live, or at
> 
> [from TLPOBJ name to CTAN package name]
> 
> Huh?
> 
> I thought that in texlive.tlpdb:
>   - either the 'name' field indicates the CTAN package name;
>   - or there is a 'catalogue' field indicating the CTAN package name.

In theory, yes. But nobody ever checked this!!! We have some catalogue
entries because I realized that the names are changed, but in general we
*just*assume* that it is like this.

As long as there is no update of the package on CTAN and we try it with
ctan2tl, we won't see the problem.

> FWIW, I couldn't find ctan2tl by browsing the TL repository through the
> web interface. Where is it?

http://www.tug.org/svn/texlive/trunk/Build/tools/ctan2tl

> > So in principle we can do:
> > 	CTAN package -> get texlive location ->
> > 	-> get TLPOBJ from texlive.tlpdb -> get docfiles from this
> > I can write you a perl script in 3 minutes that does this. (spits out
> > list of files, nothing else)
> 
> Yup, easy, but I want the Catalogue metadata carried with each doc file.

But didn't you propose that tagging is done on the package level of the
catalogue, not on the document level? In this case no need?

> I don't think this is the best interface, because:

Of course ;-)

> 1) Can a given CTAN package be split among several TEXMF trees (in TL,
>    in MiKTeX, etc.)? Or rather, do we want to support that?

No.

> 2) Do you want the data for TL-available packages to be split into
>    individual files for each package, or gathered into big files?
> 
>    Advantage for the split version: it's easier to register/unregister a
>    package by distributors: just add or remove the corresponding files.
> 
>    Disadvantage: takes more space, clutters the filesystem.

You mean of the documentation or something else?

> 3) Do you want to reduce data redundancy as much as possible?

Yes. The texlive cd will/can carry a copy of the catalogue (in some
way), and that should be taken for further information.

>    I have the impression your answer is yes, in which case I'll need a
>    copy of the Catalogue somewhere on the filesystem to lookup the
>    metadata for each CTAN package and for each documentation file.


I wanted to write:

  We want to reduce data redundancy *in the source packages*!!!

  The TLPOBJ files *CAN* (and hopefully will) be enriched with additional
  information from the catalogue, but first I have to write a catalogue
  access Perl module (and read xml, grrrr ;-).

  We want to include at least
  - title/long description
  - some version/license information
  - (taggging information?)
  - ...

but then I realized that I had some dicussion with Karl about this. So I
changed what I wrote before. Karl, am I right here?!

>    If the answer is yes, then I need one copy of the Catalogue somewhere
>    and I have to make it easy for third parties (users, distributors) to
>    extend its data when they install a package that is not referenced in
>    the copy of the Catalogue they have on disk (can usually be done by
>    dropping files in a directory).

I can imagine that additionally installed packages (in TEXMFLOCAL) drop
their description files into TEXMFLOCAL/somewhere.

> 4) Do you want to edit files in-place when a package is added or
>    removed?
> 
>    (similar to question 2, but not for available TL packages, rather to
>    tell whether a given package is installed or not).
> 
>    My tool needs to be able to tell whether a package is installed or
>    not. This way, the user can choose to either browse the whole TL

The plan is that every local installation will have a 
	local.tlpdb
(name to be changed) containing *only* those packages which are
installed.

Of course this doesn't handle the TEXMFLOCAL files.

> 5) Do we want to be able to tag individual documentation files, or only
>    CTAN packages?
[..]
>    It is quite possible that we don't need to go so far as tagging
>    individual doc files:

ACK, and thus I would also ignore the problems you were talking above,
about tagging individual files with language tags.

There are *some* packages already in CTAN with language names, e.g., 
	lshort-russian
and some others. I propose that we don't need the complexity of single
file tagging, and neither the language tags for single files.

>        Either we accept that (because such packages are rare, or because
>        choosing from a list of 10 documents is deemed acceptable), or we
>        don't. If we don't, there are two possibilities:
>           - tag individual documents, not only CTAN packages (see above);
>           - or split such CTAN packages so that each CTAN package is
>             specific enough for its tags to be relevant.

As said, for me acceptable. And as it looks it will be anyway our
decision what to implement. BTW; splitting CTAN packages is *not* a good
idea.

> (well, there is another question as I see from the rest of your mail: do
> you prefer XML or RFC-2822 format? I saw you have some grief about XML,

You have seen the texlive.tlpdb, I guess you know the answer ;-)

> > And in need of a *good* programmer like you to help a bit ;-))))
> 
> Sorry, I am genetically unsuited to work in Perl. ;-)

I don't believe you. Well, btw, if you can write Python access modules
for the TLPSRC/TLPOBJ/TLPDB/TLTREE/... that would be great,too. In fact
for *application* (i.e., installer/updater, etc) purposes only access
modules for
	TLPOBJ and TLPDB
would be necessary. The rest is only for us at obj generation time.

> You mean, there will be an unacceptable performance hit if anything in
> this design causes a program to read one file per CTAN package? Because
> of DVD head movements and things like that?

Yes.

> What I was thinking about at first was something quite similar to
> texlive.tlpdb, but with the metadata for each package and doc file. Such
> a file would be generated once at TL installation and also whenever a
> package is installed or removed (unless it doesn't contain the
> "installed" status, in which case it need only be generated at TL
> installation).
> 
> But as explained above, we can also reduce data redundancy by having one
> copy of the Catalogue in some known place on the filesystem (and provide
> support for extending this data by administrators and distributors).

I could even imagine that, if we do the tagging on a per package level,
that we add the tagging to the TLPDB, and then we have the TLPDB of
installed stuff, the TLPDB of available stuff of the TeX LIve
installation media, and the additional .xml/whatever files dropped into
TEXMFLOCAL.

> The "big files" I was talking about were not supposed to be stored in
> the TL repository, but rather generated by the TL installer (or Debian
> scripts). In this case, they cannot be out-of-date with respect to what
> is installed.

Yes, the installer creates a local.tlpdb for every installation.


So my proposal is:
- tagging is done on a per package level, not per file level
- tags are stored in the Catalogue
- tags are defined either by (format to be specified) files in the CTAN
  dir as uploaded by the author, or via the web interface, or by the
  CTAN maintainers
- tags are taken from the catalogue when generating the to be shipped 
  texlive.tlpdb and stored there
- locally installed packages can ship (format to be specified) files 
  in TEXMFLOCAL/(location to be specified)
- the doc search program takes the infos/tags from:
	- the texlive.tlpdb as shipped on the DVD
	- the local.tlpdb of installed packages
	- the additional files in TEXMFLOCAL
  and presents the info in some structured way (would allow people
  to search also in *not already installed* packages or only under those
  which are already installed.

Best wishes

Norbert

-------------------------------------------------------------------------------
Dr. Norbert Preining <preining at logic.at>        Vienna University of Technology
Debian Developer <preining at debian.org>                         Debian TeX Group
gpg DSA: 0x09C5B094      fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
-------------------------------------------------------------------------------
LYBSTER (n., vb.)
The artificial chuckle in the voice-over at the end of a supposedly
funny television commercial.
			--- Douglas Adams, The Meaning of Liff


More information about the tex-live mailing list