On Thu, 20 Apr 2006, Rob Seaman wrote:
> Mark Taylor writes:
>
> > - VOTable doesn't allow you to mark columns which are essentially
> > numeric but are formatted as strings (e.g. sexagesimal angles)
> > as such
> > - For some purposes such a facility would be useful
> > - We should modify the VOTable standard accordingly
>
> This is, of course, a much broader issue than VOTable or even VO.
Certainly, but just because it hasn't been resolved outside VOTable doesn't itself mean that we shouldn't try to address it within.
> > 1. Sexagesimal angle as hours:minutes:seconds (e.g. "23:04:46.5")
> > 2. Sexagesimal angle as degrees:minutes:seconds (e.g. "+15:12:19")
> > 3. Epoch as ISO-8601 (e.g. "2001-08-16T21:16:51.5")
>
> Note that IRAF (for instance) already treats a sexagesimal string
> literal as equivalent to a real number.
>
> On the other hand ISO-8601 represents a whole family of formats and
> is not limited to scalar timestamps, but may itself include ranges
> and such.
Fair point, which I had overlooked. The implication for my proposal would be that the representation label indicating ISO-8601 scalar timestamps should either be named something like "iso8601-scalar", or should be explicitly understood to mean that.
> > P1. Introduce a new attribute "representation" (or some other name)
> > for FIELD/PARAM to contain a special string indicating how
> > values
> > in that column are to be interpreted. The special values
> > "hms", "dms" and "iso8601" (any more?) would be initially noted
> > along with rules for what counts as valid instances of those
> > representations.
>
> Have to wonder if this is really a step forward. Have certainly seen
> examples of mixed representation columns in the past. These may even
> be more frequent in the VO as users assemble tables as intermediate
> products resulting from multiple archives. A column might include
> both sexagesimal and decimal representations - potentially even mixed
> datatypes (floats versus strings, for instance) in some more quixotic
> applications.
Well if you've got floats and strings in the same column then you can't represent it in a VOTable anyway. While I agree that one could in principle have heterogeneous data representations in a single column, I'd say it was somewhat contrary to the spirit of VOTable. More significantly, I don't think it's something which is often seen (don't think I've ever come across a table along these lines).
> The list of representations is certainly longer than this - for just
> one example, add:
>
> "simple" numeric strings: "1.23456"
>
> I put that in quotes since you would have to distinguish not just
> integer versus floating point representations, but also fixed
> precision versus the family of scientific/engineering notations.
> There is also the issue of preserving precision. Projects and
> individuals often choose a string representation precisely because
> the values won't fit into a single precision float or long integer
> value.
This (and the complications you list), as well as the point that doing it rigorously is probably going to end up in STC, had occurred to me.
To clarify: this proposal is pragmatically motivated. I come across lots of VOTables which contain sexagesimal angles, and I have contacts in, e.g., solar physics who have lots of VOTables which contain timestamps in ISO-8601 format. I agree with Roy that it would be lovely if all angles were in radians and all times were MJDs, but for the data that's out there in VOTable form, it's often not the case. As a software author I can simply refuse to process such string-valued columns or (what I actually do) I can grub around in headers and column names and look at a few values and make a guess about what's in what format. The former is the more ideologically pure path but it tends not to impress users who can't plot anything.
I agree that the proposal I'm making is not an aesthetically unimpeachable bit of data format design (though I don't think it's all that bad, at least P1), in that there will be special cases and exceptions which it looks like it should be able to solve and can't or solves but not perfectly. However I think that adding tags which mark these three string representations (hms, dms, iso8601) will potentially make a lot of tables more useful than they are now with a small change to the schema (I didn't add Rob's "simple" numeric strings because I haven't seen this kind of data in many VOTables).
So to me whether this is a sensible extension to VOTable rests on whether we see VOTable as a data format which at least strives to embody clean design and do everything the Right Way, or as a pragmatic compromise which is prepared to include features of questionable purity if they look like they're going to be sufficiently useful. Of course in reality it's somewhere between the two, but in view of its current state I'd respectfully suggest it's nearer the latter extreme.
Mark
-- Mark Taylor Astronomical Programmer Physics, Bristol University, UK m.b.taylor@bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/Received on 2006-04-20Z20:04:53