ren logo

[人] Ren - The Human Notation

Implementations | Resources | Grammars | FAQ | About 

中文 Ceský Dansk Nederlands Français Deutsch Italiano 日本人 Polski Español

Example
This is what Ren looks like.
   []			; empty list
   #()			; empty map
   [a 1 true #three]	; non-empty list
   #(a: 1 b: "two")	; non-empty map
   "Ren Example 1"	; string
   -42			; number
   98.6			; another number
   true			; literal true
   false		; literal false
   none			; literal nil/null/nada
   #(			; a bigger map
       quote:	 "禅 saying: ^"仁 rocks!^""
       same:	 {禅 saying: "仁 rocks!"}
       utf-8:    "^(CE91) to ^(cf89)"
       tag:		 
       sci-phi:  0.1618e1
       tax-rate: 3.9%
       price:    USD$79.99
       url:      http://www.ren-data.org/
       email:    info@ren-data.org
       hashtag:  #ren
       ref:      @ren-user
       date:     2013-04-17/18:37:39-06:00
       warning:  00:02     ; = 00:00:120.0
       ip-addr:  127.0.0.1
       geo-pos:  43.6x116.7x817
       hex:      16#{DECAFBAD CAFE 00FF}
       base-64:  64#{UmVuIGlzIGRhdGE=}  
   )
view as JSON 


Grammar (informative, simplified)

Repetition and alternation denotation.

list
anyvalue ]   or   anyvalue )
map
#( anyname-value-pair )
name-value-pair
name  value
name
word : space
value
list or map or none or logic or name or word
or any-number or any-string or any-date-time
or any-binary or tuple or point or comment

Type Groups
any-number
number or percent or money or NaN or INF
any-string
string or implied-string
any-date-time
abs-date-time or abs-date or rel-date-time
or rel-date or rel-time
any-binary
base-16-binary or base-64-binary

string
" anychar " or multiline-string or tag
multiline-string
{ anyUTF8except{}^  or multiline-string }
tag
< anyUTF8except<>^  or tag >
char
any UTF8 code-point except CR, LF, ", ^ or
control-char

^"; double quote
^/; linefeed
^M; carriage return
^@; null
^(CE91)      ; 2, 4, 6 or 8 hex digits


number
optsign  int  optfrac  optexp
percent
number %
money
optsign $  int  optfrac

NaN (not-a-number)
1.#NaN
INF (infinity)
optsign  1.#INF

quantity (magnitude/unit) (TBD)
number / word
number / list ???
number / implied-string ???


none
none
logic
true or false or on or off or yes or no


Absolute Dates and Times
date
4digit - [01-12] - [01-31]
time-of-day
[00-23] : [00-59] optseconds
seconds
: [00-59]  optfrac
date-time-sep
/ or T
timezone
optsign  Z or sign time-of-day
date-time
date  date-time-sep  time-of-day  opttimezone


Relative Dates and Times
rel-date (TBD)
sign  4-or-moredigit - int - int
rel-time
optsign   int : int   optrel-seconds
rel-seconds
: somedigit  optfrac
rel-date-time (TBD)
rel-date  date-time-sep  rel-time
rel-date-time ???
optsign  int : int : int : rel-time


word
word-first-char  anyword-inner-char
word-first-char
notdigitor{}"()/\@#$%^,:;<>[]'
word-inner-char
word-first-char or digit

implied-string
impl-str-first-char anyimpl-str-inner-char
impl-str-first-char
notdigitor{}"()\$^,;<>[]
impl-str-inner-char
word-inner-char or  one-of/\@#$%,:'
impl-str-first-char or digit or one-of\$,


tuple (n-tuple)
int  2-or-moretuple-part
int  2-to-15tuple-part (Limit TBD)
tuple-part
. int

point
number  someaxis
number  1-to-3axis (Limit TBD)
axis
x number


base-16-binary
opt16 #{ any 2base-16-char }  ; groups of 2
base-64-binary
64#{ any 4base-64-char }  ; groups of 4

Goals:

Uses:



Ren is a lightweight data-exchange text format. It is programming language independent with familiar conventions. Whitespace separates values, but is not significant beyond that.

Ren has two main data structures:

These data structures are the lowest common denominator in modern programming languages. It could be argued—and we did—that only list is needed, but map is used so widely that it was deemed an important element. You can, of course, use only lists in Ren and interpret them as maps as you see fit.

In Ren, the data structures take these forms:

A list is an ordered group of values. A list begins with [ (left square bracket) and ends with ] (right square bracket) or begins with ( (left paren) and ends with ) (right paren). Values are separated by whitespace.

A map is an unordered set of name/value pairs. A map begins with #( (hash)(left paren) and ends with ) (right paren). The first element in a map must be a name. Names end with (colon)(whitespace) and the name/value pairs are separated by whitespace. A map is just a list, identified by the fact that the first element is a name, and it must always contain an even number of values. That is, each name must have an associated value.

A string is a series of zero or more UTF-8 code points, enclosed in double quotes, using caret escapes where necessary. A character (char) is represented as a string containing a single code point. There is no separate notation for individual characters. Mutliline strings are also allowed, enclosed in matching curly or angle brackets ({ } or < >).

A number is just how a base-10 human would write it.


Logic values are any of the literal words in the list [true false on off yes no]. Why not just true and false? Because the other words add meaning but very little complexity.

The word none means nothing. Rather, it means something could be there, but nothing is. It isn't zero, that would be null, but Ren doesn't use null; it uses the number zero to mean zero. None is like nil or nothing. When the choice was made, nil nothing none of the other options sounded as good.


What happens when we want to venture beyond these basic value types? How do we decide what Ren can and should support?

First, look at the things people talk about, and that need to be represented when communicating between people (especially technical people like programmers), or between programs and systems. A few obvious examples are dates and times, email and IP addresses, UI positions and sizes, and URLs.

Next, consider what can be represented as a literal value, in plain text. By limiting ourselves to a text format, we make it possible, even easy, for people to write data directly (e.g., config files), and to read the data that came from somewhere else. One problem is that human readable text has to use a limited set of characters and conventions, or it will be too complex to deal with. That imposes a constraint in terms of the lexical space we can use. That is, once we use a character, or sequence of characters, and ascribe meaning to that, we can't use it for something else. That means we can't represent an unlimited set of value types without sacrificing simplicity.

Finally, look for generalities, weigh the options, determine priorities, and compromise.


Remember, whitespace separates values.

Comments may be included. They start with a semicolon (;) and continue through the end of a line, which is denoted by the new-line character (ASCII character code 10).

Ren has a large number of Value types, which distinguishes it from other data exchange languages that support only a few native data types, common in most programming languages, where everything else becomes a string. The problem is that you then have to look at every string value and decide what it might mean, based on its contents. This is work that shouldn't have to be redone in every application. And while it isn't possible to include every possible datatype used in every language and system, there are some common values that can provide a big return on investment for a small amount of additional complexity in the Ren grammar. Leveraging commonalities and grouping values into a shallow type hierarchy also helps manage complexity.


A date-time value conforms to the RFC 3339 standard, with the addition of allowing a / (slash) to be used in place of T as the separator between the date and time. It is one of the more complex types in Ren, but adds so much value that it's worth supporting. In addition to absolute date-time values, Ren supports relative dates and times (rel-date-time) used for durations, offsets, or spans of time. Their format is consistent with date-time values, and very close to the alternate format for durations in ISO 8601.

Word and Name values are like symbols or identifiers in languages that support them. They can't start with a digit or one of a small set of special reserved characters, but they can contain digits after the first character. They can't contain spaces. That would make them more than one word. A Name is just a word with a colon at the end. Names are most often used as keys in maps, but are valid values anywhere. While other notations use strings as keys, which means you can use almost character in a key, you have to use quotes around every name, whether they contain spaces or not. Statistics show that 99.987% of names do not contain spaces. If one of your names is in the 0.013%, replace the spaces with something else and move on, or use a list.

An implied-string is somewhere between a word and a string. You don't have to put them in quotes, but they also can't contain whitespace. They are like multi-part words that may contain certain special characters words can't, like colons and slashes. They are used for URLs, email addresses, filenames and more. For implied strings that contain spaces, they must be escaped somehow. e.g., for URLs, you can use percent encoding as a normal form.

Tuples are lists of numbers that can be used for IP addresses and color values. They use a dot as their part separator, can't contain decimal values, and must contain at least three parts (to differentiate them from decimal values).

Point values are used to denote coordinates. They are like tuples, except that they use x as the part separator and they may contain decimal values, because there is no ambiguity.

Binary values come in two flavors, base-16-binary and base-64-binary. They begin with #{ (hash left-brace) and end with } (right brace). Between the braces, the allowed characters depend on the base, and the number of characters must match a well-formed value for that type. For example, base-16 values must come in pairs of two hex digits. You cannot specify a nibble with one hex digit. Binary values may contain spaces and line breaks, which are ignored.


Implementations:



Resources: