NAME

urispace — Static urispace (default uri definitions and methods).

synopsis

generic-uri ::= scheme: [ user-host-address-and-path ] [ parameter ] [ search ] [ fragment ]

scheme ::= a|...|z|A|...|Z|0|...|9|.|+|-...
user-host-address-and-path ::= user-host-address [ absolute-path ] | user-host-address/./relative-path | path
user-host-address ::= // [ user@ ] host [ :port ]
path ::= absolute-path | relative-path
absolute-path ::= /relative-path
relative-path ::= component | relative-path/component
component ::= . | .. | string-not-including-/
parameter ::= parameter-item | parameter parameter-item
parameter-item ::= ;parameter-name [ =parameter-value ]
search ::= ?string
fragment ::= #string

relative-uri ::= [ user-host-address-and-path ] [ parameter ] [ search ] [ fragment ]

file-uri ::= file: [ //host ] [ path ] [ parameter ]

data-uri ::= data: [ type ] [ subtype ] [ parameter ] [ data ]

type ::= string
subtype ::= /string
data ::= ,string

ftp-uri ::= ftp: [ user-password-host-address-and-path ] [ parameter ]

user-password-host-address-and-path ::= user-password-host-address [ absolute-path ] | user-password-host-address/./relative-path | path
user-password-host-address ::= // [ user [ :password ] @ ] host [ :port ]

gopher-uri ::= gopher: [ host-address ] [ /gtype selector [ gsearch [ gstring ] ] ] [ parameter ]

host-address ::= //host [ :port ]
gtype ::= /character
selector ::= /variable
gsearch ::= tab string
gstring ::= tab string
tab ::= tab-character-x09 | %09

http-uri ::= http: [ host-address-and-path ] [ parameter ] [ search ] [ fragment ]

mailto-uri ::= mailto: [ user@ ] host [ :port ] [ parameter ] [ ?headers ]

headers ::= header | headers&header
header ::= header-type=header-content

nntp-uri ::= nntp: [ user-password-host-address ] [ /group [ /article ] ] [ parameter ]

pop-uri ::= pop: [ user authorisation@ ] [ host ] [ :port ] [ message ] [ parameter ]

authorisation ::= [ ;AUTH=authorisation type ] [ ;authorisation data ]
message ::= /message number

channel-uri ::= channel: channel-identifier

none-uri ::= none: string

oav-uri ::= oav: variable :path [ parameter ]

description

Information about URIs is stored in an OAV mapping referred to as the urispace. This includes the parsing pattern, the generation format, the list of parts, and definitions of methods. OAV permits objects to be delegated, and for parents to be searched. All this flexibility is available for resolving information about URIs. The uri command consults this OAV mapping to parse URIs and evaluate methods on them.

The uri command has a static mapping compiled into it to provide a default set of definitions. A default internal urispace is initially defined. This space can be supplemented by defining another mapping, possibly delegating to or copying from the original default mapping. The new mapping can then be made the new urispace.

The wyrm-urispace defines standard URLs for file, data, ftp, gopher, http, mailto, news, nntp, pop, relative, and nonstandard URLs for oav mappings and no resource. A generic URI is defined which should match most other URIs. The methods open, get, put, delete, status, and children methods are defined for each of these URIs, though in many cases the default implementation is to return an error that the operation is not really implemented.

URIs

Generic URL
A generic URI might match a URI that is not otherwise defined. Because of the OAV delegation rules, if the key uri.scheme.string is not found, the key uri.string will be sought. Such keys are the generic URI keys; there are no 'uri.generic....' keys, it is the lack of a key that makes it generic. The generic URI also provides default definitions for the methods.
Relative URL
A relative URL is any URI without a scheme. The key is 'uri.(relative)....'. (Parenthesis cannot appear in a a scheme, so this makes prevents confusion with a real scheme.) uri will supply the scheme '(relative)' if it cannot find one in the URI or if the scheme is removed.
Relative URL methods are delegated to the file URL.
File URL
The resource is some file on the local file system. (The host name is ignored.)
The file is openned with the normal Tcl file open, the status is the Tcl file status (with exists and uri keys) if the file exists, and its children are the glob results. If the file does not exists, its status has exists 0 and the uri.
The open method permits any of the usual open modes and permissions. If any of the parameter ';blocking', ';bufferring', ';buffersize', ';encoding', ';eofchar', or ';translation' are in the URL, they are used to configure the file after open.
set ch [::wyrm::uri open "file:abc;translation=cr%20crlf;buffersize=1024" -mode w+ -pem 600]
has the same effect as
set ch [open abc w+ 600]
fconfigure $ch -translation {cr crlf}
fconfigure $ch -buffersize 1024
Data URL
The data URL resource is the string in the URL itself. One of its parameter can be 'base64' to indicate the string is in BASE64 encoding.
The data URL can be openned with data command. The data status includes the mimetype in addition to the obligatory exists and uri keys.
FTP URL
An FTP URL has the information to open an FTP connection and send or receive a file.
The open method returns the socket to the remote FTP server to the live file. It supports read (r or RDONLY), write (w or WRONLY), or append (a APPEND) modes. The transfer type image (I) or ASCII (A) can be set with the ";type=I" or ";type=A" parameter. The configuration parameters blocking, buffering, buffersize, encoding, eofchar, and translation can be specified as parameters; they are applied to the returned data socket channel. The FTP password can also be passed in with with -password pass or as a script that evaluates to the password -script pass-script.
Gopher URL
An Gopher URL has the information to open an Gopher connection and receive a page.
Currently no methods are implemented.
HTTP URL
An HTTP URL has the information to open an HTTP connection and receive a page. The HTTPS scheme is included with HTTP, but it must be separately registerred with http::register.
The get method uses the http package supplied with Tcl. It offers a number of additional parameters that are passed through to this package: -binary 0|1, -blocksize size, -channel output-channel, -command http-package-callback, -handler callback, -progress callback, -timeout milliseconds, -header variable, and +header header-value. Additional configuration can be handled by calling http::config and http::register directly. http-package-callback receives the http token, while -async callback receives the resource and possible error, the same as other uri get methods. The -head variable stores the response headers and meta data in the associative mapping in the variable; the variable must be global or namespace scope for asynchronous calls; this is ignored if the callback is given with -command.
The POST request is handled by a separate post method
uri post url post-data option...
It accepts the get parameters as well as -postblocksize size, -postprogress callback, and -type type/type. Unless the type is given and if the post-data is a list, it is converted to x-url-encoding.
The status method accepts all the get parameters, except -header (since the return value includes the headers).
Mail URL
A mailto URL identifies the recipient of an email. Headers and even body text of the message can be specified in the URL. To actually send the message, the text body and other header may have to be supplied and the mail protocol and host, usually SMTP, must be supplied.
The put method mails the message. The data can contains headers before the message body. Additional headers can be specified in the URL. Two parameters are used: ";relay=smtp-host" identifies the SMTP relay host; ";sender=user@host". If the relay is not specified, it attempts to get the receiver's MX DNS records. If the sender is not specified, it is taken from the message headers.
NNTP URL
An NNTP URL identifies a group or article on an NNTP host and provides the information to establish a connection. (Some ISPs do not require user identification to connect to an NNTP server; the server will only accept connections from IP addresses within the ISP.)
Get method retrieves the news article. The put method posts the data as an article; it ignores any group or article in the uri. The status gets the article headers. The children gets all groups on the hosts, or all articles in a group; the children method accept the URL parameter ";after=seconds" (with seconds as from [clock seconds]); only groups or articles after that time are returned. The NNTP password can also be passed in with with -password pass or as a script that evaluates to the password -script pass-script.
POP3 URL
A POP URL has the information to open a POP3 connection and receive email. A specific message can be identified. Various authorisation schemes may be specified such as a simple password or an MD5 challenge and response.
A POP3 resource can be gotten to retrieve one message, deleted to mark it has been read, and a list of message numbers returned as the children. It is a read-only resource and does not support the put method. The POP3 password can also be passed in with with -password pass or as a script that evaluates to the password -script pass-script.
A channel URL is a nonstandard URL which identifies an already open Tcl channel.
None URI
A nonstandard none URI explicitly indicates no resource exists. A comment can be added but it is not used by any of the methods.
A none resource cannot be openned (it does not exists).
OAV URL
An OAV URL is a nonstandard scheme to access a variable (key) in an OAV file mapping.
A variable can be gotten or put to the mapping. The children are all keys in the mapping. Status is the status of the file.

URI Space Objects

If a uri has the scheme, the parse pattern is gotten with the OAV key uri.scheme.+pattern, the parts list with uri.scheme.+parts, the generate format with uri.scheme.+generate, and the Tcl code implementing method with uri.scheme.method. And these keys may end up delegated elsewhere.

New URIs can be added to a modifiable urispace with the oav command. For example a "qwerty" scheme can be defined with

oav put [uri space] uri.qwerty.+parts {scheme text}
oav put [uri space] uri.qwerty.+pattern ^(qwerty):(.*)$}
oav put [uri space] uri.qwerty.+generate {s:s}
oav method [uri space] uri.qwerty.open {uri args} {open "qwerty" r}

A generic URI is provided by defining the keys uri.+pattern, uri.+parts, uri.+generate, and uri.method for generic methods. If a scheme is not (completely) defined in the urispace, then the OAV delegation rules will find these generic pattern, parts, generate, and methods for what is not specifically defined.

And keys not conforming to these patterns may be in the urispace. They are not directly accessed, but they may be delegated to or used for routines other than uri.

URIs are parsed with a regular expressions. Parentheses surround the subexpression matching parts. "(?:...)" is used to group subexpressions that are not parts. For example

...(?://([^:/]*)(?::([^/]*)))?...
could parse a host address. "([^:/]*)" matches the host part, and "([^/]*)" the port part. "(?::([^/]*))" matches the colon+port part, and the "?:" means it does not correspond to a specific URI part. ".(?://([^:/]*)(?::([^/]*)))?" matches the whole host address containing the host and port parts.

The parts list is a list name of parts corresponding to the parenthesised (but not "(?:...)" parenthesised) subexpressions.

The parsing is similar to

eval [list regexp [oav get [uri space] uri.scheme.+pattern] uri] all [oav get [uri space] uri.scheme.+parts]

After the URI has been parsed into parts, possibly editted, it is regenerated with its scheme's generate format. The format is a string of formatters and literal characters; formatters are single letters each of which corresponds to one part, and the brackets '(' and ')'. The URI parts are generated in parts list order under the control of formatter letters.

formatter 'x':
Do not format the part, just skip it.
formatter 'h':
Insert a '/./' between the host address and path if the host address is not empty and the path does not begin with '/', for example '//host' + 'pa/th' would yield '//host/./pa/th'. The standards presume the path after a host address is always absolute, and this forces that. No part is actually formatted. (This formatter does not have to be in this position. In particular it inserts '/./' if the currently generated substring is not empty and the next part does not begin with '/'.)
formatter 's':
Add the part string with all special characters replaced with escape sequences.
formatter 'r':
Add the part string with all special characters except '&' and '?' replaced with escape sequences. This is intended for search parts.
formatter 'p':
Add the part string with all special characters except ';' and '=' replaced with escape sequences. This is intended for parameter parts.
formatters '(' and ')':
Optional substrings. If all parts formatted between a '(' and matching ')' are missing, the contents of the bracketted substring is suppressed, even if literal characters are added. For example with a network address format '...(//s(:s))...' if the host part is present as 'abc' and the port part missing, the ':'+missing port substring is suppressed, but the '//'+host part 'abc'+suppressed port generates '//abc'. Brackets suppress literal characters that would delimit otherwise missing parts.
Literal characters:
Other characters are added as is. They may still be suppressed if they are in bracketted subexpressions. By happy not-so-coincidence, literal characters and formatters are distinct.

Generic Methods

Open Method

The open, get, and put methods transfer a resource as a string of bytes. Each URI should provide a method to open a channel to the resource so that caller can use puts and read to move the resource bytes, or the URI should provide a get and put methods to move the bytes en masse. The generic URI provides an open method that can open a data channel using specific get and put methods, and it provides get and put methods that can use a specific open method. A URI can also provide all three open, get, and put methods.

A URI for read only resource need only define the get method; the put method will eventually fail.

The generic methods use an additional method arguments to detect when the URI does not provide an open and a get or put method.

::wyrm::uri open uri [ -mode mode | -pem permissions | -async callback-script | -loop caller ] ...
mode
The usual open mode such as "r", "w+", "RDONLY", et cetera. The generic open only supports "r", "RDONLY", "w", and "WRONLY". The default is "r".
permissions
The create file new permissions number. Default is 0640.
callback-script
If not specified, the open always waits for the open to succeed or fail before returning the channel name or an error message. If the script is specified, the open returns an empty string immediately; at some later time when the open is completed, the channel name is list append to callback-script and evaluated if the open succeeded, or a some string and an error message are appended and the script evaluated if an error. If the resource does not support asynchronous open, it will still simulate it by returning a blank string and scheduling the script as an idle script. If the resource is openned write, errors might not be detected until the channel is closed; in this case the callback-script will be called twice: once with the channel name, and once again with the channel name and error message.
-loop
Detects unimplemented methods. It can be ignored as long as the resource actually implements the open or get and put methods without calling the generic methods.

Open a channel to the URI resource. Caller is responsible for closing the channel; a writeable channel might not actually transfer the resource bytes until the channel is closed.

Get Method

::wyrm::uri get uri [ -async callback-script | -loop caller ] ...
callback-script
If not specified, the get always waits for the data or failure before returning the data or an error message. If the script is specified, the get returns an empty string immediately; at some later time when the get is completed, the data is list append to callback-script and evaluated if the get succeeded, or a some string and an error message are appended and the script evaluated if an error. If the resource does not support asynchronous get, it will still simulate it by returning a blank string and scheduling the script as an idle script.
-loop
Detects unimplemented methods. It can be ignored as long as the resource actually implements the open or get and put methods without calling the generic methods.

Get the bytes of a URI resource. The entire resource is transferred in the one call; no further action is necessary to release or close the resource.

Put Method

::wyrm::uri put uri data [ -pem permissions | -async callback-script | -loop caller ] ...
permissions
The create file new permissions number. Default is 0640.
callback-script
If not specified, the put always waits for the data to be sent or a failure before returning an empty string or an error message. If the script is specified, the put returns an empty string immediately; at some later time when the put is completed, the callback-script is evaluated if the put succeeded, or an error message is appended and the script evaluated if an error. If the resource does not support asynchronous put, it will still simulate it by returning a blank string and scheduling the script as an idle script.
-loop
Detects unimplemented methods. It can be ignored as long as the resource actually implements the open or get and put methods without calling the generic methods.

Put the bytes to a URI resource. The entire resource is transferred in the one call; no further action is necessary to release or close the resource.

Delete Method

::wyrm::uri delete uri [ -r recursive ] [ -async callback-script ]
recursive
If true, all children are deleted as well. If a resource has chldren and recursive is false or not specified, the resource cannot be deleted.
callback-script
If not specified, the delete always waits for the resource to be deleted or a failure before returning an empty string or an error message. If the script is specified, the delete returns an empty string immediately; at some later time when the delete is completed, the callback-script is evaluated if the delete succeeded, or an error message is appended and the script evaluated if an error. If the resource does not support asynchronous delete, it will still simulate it by returning a blank string and scheduling the script as an idle script.

Deletes the URI resource. The generic implementation fails with the error message it cannot delete the resource.

Status Method

::wyrm::uri status uri [ -async callback-script | key default-value ] ... [ -async callback-script ]
callback-script
If not specified, the status always waits for the resource status before returning the status list. If the script is specified, the status returns the an empty immediately; at some later time when the status is completed, the status list is list appended as one element to callback-script and evaluated. If the resource does not support asynchronous status, it will still simulate it by returning a blank string and scheduling the script as an idle script.
key default-value
These keys are always returned in the status list. If the resource does not return a particular key, its default-value is returned in the status list.

Return information about the resource. This is a key/value pairs list which alway contains the keys "exists" and "uri". Any key/value pairs on the command define default pairs in the result.These default are replaced by real status values provided by the resource.

The status method always succeeds to return some kind of list. If the resource does not exists, it returns the key "exists" with the value 0; other failures can be indicated by additional keys.

Children Method

::wyrm::uri children uri [ -async callback-script ]
callback-script
If not specified, the children always waits for the children list or a failure before returning the children list or an error message. If the script is specified, the children returns an empty immediately; at some later time when the children is completed, the children list is list appended as one element to callback-script and evaluated if the children succeeded, or a some string and an error message are appended and the script evaluated if an error. If the resource does not support asynchronous children, it will still simulate it by returning a blank string and scheduling the script as an idle script.
Return a list of URIs which are somehow subsidary to the given URI. For example, the files in a directory URI would be its children. The generic method is an empty list.

SEE ALSO

oav (1WY), uri (1WY).