NAME

wyrm_uriGet — Get a part from the URI.

synopsis

#include "wyrm-uri.h"

Obj wyrm_uriGet(Intr intr,Obj uri,chars part);

Obj wyrm_uriGetAll(Intr intr,Obj uri);

Obj wyrm_uriPut(Intr intr,Obj uri,chars part,Obj newpart);

Obj wyrm_uriPutList(Intr intr,Obj uri,int N,Obj *P);

Obj wyrm_uriRemove(Intr intr,Obj uri,chars part);

description

This package provides an object orientish way to use URIs, with the URI scheme identifying the kind of objects. URIs can be parsed with parts extracted, replaced, or added. A method can be identified from the scheme and then evaluated. The objects are stored in an OAV mapping and use OAV delegation and parent identification. This OAV mapping is called the urispace.

A URI is a Uniform Resource Indicator and a URL is a Uniform Resource Locator, a subset of URIs. The other kind of URI is the experimental URN Uniform Resource Name. A URL indicates one specific location of a resource and how to access it. A URN indicates a resource in general; the URN must then be resolved to a URL automatically. URNs are still experimental; in practice all URIs are just URLs. These routines can however manipulate both URLs and URNs.

Each URI belongs to a scheme, such as 'ftp' or 'telnet'. The scheme begins the URI (except the relative scheme). A URI without a beginning scheme is always the '(relative)' scheme URL. If the scheme cannot be identified in the urispace OAV mapping, it is treated as a generic scheme.

The syntax of a URI depends on its scheme; the syntax defines the URI as sequence of parts with various delimiters. For example, with the URL 'ftp://hodgepodge/x/y/z;type=d', the scheme is 'ftp' and the parts are 'scheme' (ftp), 'host' (hodgepodge), 'path' (/x/y/z), and the 'parameter' (;type=d). The 'ftp' scheme has additional parts such as 'user' and 'port' which are empty in this specific URI. Each part of a URI is named; these names are used to extract, replace, and remove parts in parsed URIs. Part names are case sensitive.

wyrm_uriGet gets one part out of a URI. The URI scheme is identified; its parsing pattern and parts list found in the urispace; the URI is parsed; and the resulting part is returned. If the part is not in the scheme's parts list, an undefined error is returned; if the part is in the parts list but not the specific URI, a missing error is defined.

Special processing can be done on 'path' and 'parameter' parts as indicated elsewhere.

wyrm_uriGetAll gets all part out of a URI. All existing parts are returned in a part/value pairs list.

wyrm_uriPut puts a part value into a URI. The value replaces an existing part or adds a new part. The resulting URI is generated and returned.

Special processing can be done on 'path' and 'parameter' parts as indicated elsewhere.

wyrm_uriPutList puts part values into a URI. The values replace an existing part or add a new part. The resulting URI is generated and returned.

Special processing can be done on 'path' and 'parameter' parts as indicated elsewhere.

wyrm_uriRemove removes a part and its value from a URI. If the scheme is removed, the relative scheme "(relative)" is added. The resulting URI is generated and returned.

Nonascii characters and characters other letters, digits, or any of "-_.!~*+'()=/" that occur in a URI part should be escaped to prevent their misinterprettation as a part delimiter. The escape sequence is %hexidecimal-digit hexidecimal-digit where the hexidecimal digits are the code of the escaped character.

For example if a host name is 'dr.who?', in an http URI it would be encoded as 'http://dr.who%3f/cgi/q?query' so that "?" in the host name part is not mistaken as query part delimiter. The isolated parts extracted or replaced do not use escaped sequences; escape sequence are only used on the original parsed URI and the final generated URI.

Paths are converted to canonical forms without embedded '.' or '..' components or multiple '/' separators. A '..' component will only remain at the beginning of a path, no '.' component will remain.

For example,

/a/b/c remains /a/b/c
/a/b/./c becomes /a/b/c
/a/b///c becomes /a/b/c
/a/b/c/ remains /a/b/c/
/a/b/c/.. becomes /a/b
/a/b/c/../ becomes /a/b/
/a/b/../../c/ becomes /c
/a/b/../../../../c/ becomes /c
/a/./b/..///c/ becomes /a/c
a/b/c remains a/b/c
./a/b/c becomes a/b/c
./a/b/c/. becomes a/b/c
../a/b/c remains ../a/b/c
../../../a/b/c remains ../../../a/b/c

This operates on the URI path not a file system path. Components that are symbolic links on a particular file systems are not modified.

URI path is usually a unix-style path, and it suffices to treat it as such, but it can be more complicated. A path may contain as escape sequence slashes, dots, and other characters that change the interpretation of a path in escaped and non-escaped forms. If a URI has a path, additional part names are defined:

PATH
The path is split at '/' separators into a list of path components; path components are individually escaped and may contain slashes. The first element is '/' for an absolute path, or '.' or multiple '..' elements for relative paths. A final '/' element is appended if the path ends in '/'. Any other slash in the list is escaped in the URI.
unix
A path list compatiable with 'file split' and 'file join' on a Unix machine. This is actually identical with the 'PATH' list.
macintosh
The path is split at '/' separators into a list of path components; path components are individually escaped and may contain slashes. The first element has a suffixed ':' for an absolute path (it is a volume name), or possibly multiple '::' elements for relative paths. Any other slash in the list is escaped in the URI.

A URI parameter part ';a;b=q;c=r' can be regarded as a single parameter block, or as a list of individual parameter parts ';a', ';b=q', and ';c=r'. If the part name 'parameter' is used, the parameters are presented as single packed block. If the part name begins with a ';', the individual parameter values are used. The parameter value is the substring after '=' (with an implicit value of '1' if no '=' is given). In this example, the value of 'parameter' is ';a;b=q;c=r', ';a' is '1', ';b' is 'q', and ';c' is 'r'. If the 'parameter' part is not accessed, the parameter values are individually escaped so that ';' and '=' can be used in parameter values as escape sequences.

SEE ALSO

uri (1WY), urispace (4WY), wyrm_uriAbsolutise (3WY), wyrm_uriMethod (3WY), wyrm_urispace (3WY).