[gcs-pcs-list] Autodiscovery and embedding metadata
camster at citeulike.org
Mon Dec 20 07:23:57 EST 2004
Following on from Dan's email, I thought I'd kick things off on the
I've got a rather personal perspective on this, which is to say there's
an immediate application: I run a "social bookmarking" site for
academic papers http://www.citeulike.org and a lot of my time is spent
trying to solve the "URL to metadata problem".
The problem is that when one of my users finds an article on the web
(say on PubMed to give a concrete example), my server ends up with the
URL. I then fetch that page myself, and go about the business of trying
to figure out what the metadata associated with that article should be.
A lot of this involves some pretty horrendous scraping code, which is
fragile and an absolute pain to write.
If only, I thought, there was a standard way of embedding the data into
the HTML page itself. It seems Dan's been thinking about exactly the
same problem from a slightly different perspective, and proposed a
number of possible ways of doing this. I've tried to summarise his
options on a Wiki page:
If anyone has any thoughts, or can think of why any of these methods
would be unsuitable for a particular domain then shall we talk about it
on the list? Feel free to edit the Wiki too, but it might be wise to
reach some consensus on the list first.
I know a lot of people will be on holiday at the moment, and mincemeat
and merriment is likely to be more interesting than talking about this,
so perhaps it's something we can pick up again in the new year?
More information about the gcs-pcs-list