initial release
This commit is contained in:
418
cosmic rage/docs/re.html
Normal file
418
cosmic rage/docs/re.html
Normal file
@@ -0,0 +1,418 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||
<html>
|
||||
<head>
|
||||
<title>LPeg.re - Regex syntax for LPEG</title>
|
||||
<link rel="stylesheet"
|
||||
href="http://www.inf.puc-rio.br/~roberto/lpeg/doc.css"
|
||||
type="text/css"/>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<!-- $Id: re.html,v 1.15 2010/11/05 12:53:43 roberto Exp $ -->
|
||||
|
||||
<div id="container">
|
||||
|
||||
<div id="product">
|
||||
<div id="product_logo">
|
||||
<a href="http://www.inf.puc-rio.br/~roberto/lpeg/">
|
||||
<img alt="LPeg logo" src="lpeg-128.gif"/>
|
||||
</a>
|
||||
</div>
|
||||
<div id="product_name"><big><strong>LPeg.re</strong></big></div>
|
||||
<div id="product_description">
|
||||
Regex syntax for LPEG
|
||||
</div>
|
||||
</div> <!-- id="product" -->
|
||||
|
||||
<div id="main">
|
||||
|
||||
<div id="navigation">
|
||||
<h1>re</h1>
|
||||
|
||||
<ul>
|
||||
<li><a href="#basic">Basic Constructions</a></li>
|
||||
<li><a href="#func">Functions</a></li>
|
||||
<li><a href="#ex">Some Examples</a></li>
|
||||
<li><a href="#license">License</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</div> <!-- id="navigation" -->
|
||||
|
||||
<div id="content">
|
||||
|
||||
<h2><a name="basic"></a>The <code>re</code> Module</h2>
|
||||
|
||||
<p>
|
||||
The <code>re</code> module
|
||||
(provided by file <code>re.lua</code> in the distribution)
|
||||
supports a somewhat conventional regex syntax
|
||||
for pattern usage within <a href="lpeg.html">LPeg</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The next table summarizes <code>re</code>'s syntax.
|
||||
A <code>p</code> represents an arbitrary pattern;
|
||||
<code>num</code> represents a number (<code>[0-9]+</code>);
|
||||
<code>name</code> represents an identifier
|
||||
(<code>[a-zA-Z][a-zA-Z0-9_]*</code>).
|
||||
Constructions are listed in order of decreasing precedence.
|
||||
<table border="1">
|
||||
<tbody><tr><td><b>Syntax</b></td><td><b>Description</b></td></tr>
|
||||
<tr><td><code>( p )</code></td> <td>grouping</td></tr>
|
||||
<tr><td><code>'string'</code></td> <td>literal string</td></tr>
|
||||
<tr><td><code>"string"</code></td> <td>literal string</td></tr>
|
||||
<tr><td><code>[class]</code></td> <td>character class</td></tr>
|
||||
<tr><td><code>.</code></td> <td>any character</td></tr>
|
||||
<tr><td><code>%name</code></td>
|
||||
<td>pattern <code>defs[name]</code> or a pre-defined pattern</td></tr>
|
||||
<tr><td><code>name</code></td><td>non terminal</td></tr>
|
||||
<tr><td><code><name></code></td><td>non terminal</td></tr>
|
||||
<tr><td><code>{}</code></td> <td>position capture</td></tr>
|
||||
<tr><td><code>{ p }</code></td> <td>simple capture</td></tr>
|
||||
<tr><td><code>{: p :}</code></td> <td>anonymous group capture</td></tr>
|
||||
<tr><td><code>{:name: p :}</code></td> <td>named group capture</td></tr>
|
||||
<tr><td><code>{~ p ~}</code></td> <td>substitution capture</td></tr>
|
||||
<tr><td><code>=name</code></td> <td>back reference
|
||||
</td></tr>
|
||||
<tr><td><code>p ?</code></td> <td>optional match</td></tr>
|
||||
<tr><td><code>p *</code></td> <td>zero or more repetitions</td></tr>
|
||||
<tr><td><code>p +</code></td> <td>one or more repetitions</td></tr>
|
||||
<tr><td><code>p^num</code></td> <td>exactly <code>n</code> repetitions</td></tr>
|
||||
<tr><td><code>p^+num</code></td>
|
||||
<td>at least <code>n</code> repetitions</td></tr>
|
||||
<tr><td><code>p^-num</code></td>
|
||||
<td>at most <code>n</code> repetitions</td></tr>
|
||||
<tr><td><code>p -> 'string'</code></td> <td>string capture</td></tr>
|
||||
<tr><td><code>p -> "string"</code></td> <td>string capture</td></tr>
|
||||
<tr><td><code>p -> {}</code></td> <td>table capture</td></tr>
|
||||
<tr><td><code>p -> name</code></td> <td>function/query/string capture
|
||||
equivalent to <code>p / defs[name]</code></td></tr>
|
||||
<tr><td><code>p => name</code></td> <td>match-time capture
|
||||
equivalent to <code>lpeg.Cmt(p, defs[name])</code></td></tr>
|
||||
<tr><td><code>& p</code></td> <td>and predicate</td></tr>
|
||||
<tr><td><code>! p</code></td> <td>not predicate</td></tr>
|
||||
<tr><td><code>p1 p2</code></td> <td>concatenation</td></tr>
|
||||
<tr><td><code>p1 / p2</code></td> <td>ordered choice</td></tr>
|
||||
<tr><td>(<code>name <- p</code>)<sup>+</sup></td> <td>grammar</td></tr>
|
||||
</tbody></table>
|
||||
<p>
|
||||
Any space appearing in a syntax description can be
|
||||
replaced by zero or more space characters and Lua-style comments
|
||||
(<code>--</code> until end of line).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Character classes define sets of characters.
|
||||
An initial <code>^</code> complements the resulting set.
|
||||
A range <em>x</em><code>-</code><em>y</em> includes in the set
|
||||
all characters with codes between the codes of <em>x</em> and <em>y</em>.
|
||||
A pre-defined class <code>%</code><em>name</em> includes all
|
||||
characters of that class.
|
||||
A simple character includes itself in the set.
|
||||
The only special characters inside a class are <code>^</code>
|
||||
(special only if it is the first character);
|
||||
<code>]</code>
|
||||
(can be included in the set as the first character,
|
||||
after the optional <code>^</code>);
|
||||
<code>%</code> (special only if followed by a letter);
|
||||
and <code>-</code>
|
||||
(can be included in the set as the first or the last character).
|
||||
</p>
|
||||
|
||||
<p>
|
||||
Currently the pre-defined classes are similar to those from the
|
||||
Lua's string library
|
||||
(<code>%a</code> for letters,
|
||||
<code>%A</code> for non letters, etc.).
|
||||
There is also a class <code>%nl</code>
|
||||
containing only the newline character,
|
||||
which is particularly handy for grammars written inside long strings,
|
||||
as long strings do not interpret escape sequences like <code>\n</code>.
|
||||
</p>
|
||||
|
||||
|
||||
<h2><a name="func">Functions</a></h2>
|
||||
|
||||
<h3><code>re.compile (string, [, defs])</code></h3>
|
||||
<p>
|
||||
Compiles the given string and
|
||||
returns an equivalent LPeg pattern.
|
||||
The given string may define either an expression or a grammar.
|
||||
The optional <code>defs</code> table provides extra Lua values
|
||||
to be used by the pattern.
|
||||
</p>
|
||||
|
||||
<h3><code>re.find (subject, pattern [, init])</code></h3>
|
||||
<p>
|
||||
Searches the given pattern in the given subject.
|
||||
If it finds a match,
|
||||
returns the index where this occurrence starts,
|
||||
plus the captures made by the pattern (if any).
|
||||
Otherwise, returns nil.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
An optional numeric argument <code>init</code> makes the search
|
||||
starts at that position in the subject string.
|
||||
As usual in Lua libraries,
|
||||
a negative value counts from the end.
|
||||
</p>
|
||||
|
||||
<h3><code>re.match (subject, pattern)</code></h3>
|
||||
<p>
|
||||
Matches the given pattern against the given subject.
|
||||
</p>
|
||||
|
||||
<h3><code>re.updatelocale ()</code></h3>
|
||||
<p>
|
||||
Updates the pre-defined character classes to the current locale.
|
||||
</p>
|
||||
|
||||
|
||||
<h2><a name="ex">Some Examples</a></h2>
|
||||
|
||||
<h3>A complete simple program</h3>
|
||||
<p>
|
||||
The next code shows a simple complete Lua program using
|
||||
the <code>re</code> module:
|
||||
</p>
|
||||
<pre class="example">
|
||||
local re = require"re"
|
||||
|
||||
-- find the position of the first number in a string
|
||||
print(re.find("the number 423 is odd", "[0-9]+")) --> 12
|
||||
|
||||
-- similar, but also captures (and returns) the number
|
||||
print(re.find("the number 423 is odd", "{[0-9]+}")) --> 12 423
|
||||
|
||||
-- returns all words in a string
|
||||
print(re.match("the number 423 is odd", "({%a+} / .)*"))
|
||||
--> the number is odd
|
||||
</pre>
|
||||
|
||||
|
||||
<h3>Balanced parentheses</h3>
|
||||
<p>
|
||||
The following call will produce the same pattern produced by the
|
||||
Lua expression in the
|
||||
<a href="lpeg.html#balanced">balanced parentheses</a> example:
|
||||
</p>
|
||||
<pre class="example">
|
||||
b = re.compile[[ balanced <- "(" ([^()] / balanced)* ")" ]]
|
||||
</pre>
|
||||
|
||||
<h3>String reversal</h3>
|
||||
<p>
|
||||
The next example reverses a string:
|
||||
</p>
|
||||
<pre class="example">
|
||||
rev = re.compile[[ R <- (!.) -> '' / ({.} R) -> '%2%1']]
|
||||
print(rev:match"0123456789") --> 9876543210
|
||||
</pre>
|
||||
|
||||
<h3>CSV decoder</h3>
|
||||
<p>
|
||||
The next example replicates the <a href="lpeg.html#CSV">CSV decoder</a>:
|
||||
</p>
|
||||
<pre class="example">
|
||||
record = re.compile[[
|
||||
record <- ( field (',' field)* ) -> {} (%nl / !.)
|
||||
field <- escaped / nonescaped
|
||||
nonescaped <- { [^,"%nl]* }
|
||||
escaped <- '"' {~ ([^"] / '""' -> '"')* ~} '"'
|
||||
]]
|
||||
</pre>
|
||||
|
||||
<h3>Lua's long strings</h3>
|
||||
<p>
|
||||
The next example matches Lua long strings:
|
||||
</p>
|
||||
<pre class="example">
|
||||
c = re.compile([[
|
||||
longstring <- ('[' {:eq: '='* :} '[' close) -> void
|
||||
close <- ']' =eq ']' / . close
|
||||
]], {void = function () end})
|
||||
|
||||
print(c:match'[==[]]===]]]]==]===[]') --> 17
|
||||
</pre>
|
||||
|
||||
<h3>Indented blocks</h3>
|
||||
<p>
|
||||
This example breaks indented blocks into tables,
|
||||
respecting the indentation:
|
||||
</p>
|
||||
<pre class="example">
|
||||
p = re.compile[[
|
||||
block <- ({:ident:' '*:} line
|
||||
((=ident !' ' line) / &(=ident ' ') block)*) -> {}
|
||||
line <- {[^%nl]*} %nl
|
||||
]]
|
||||
</pre>
|
||||
<p>
|
||||
As an example,
|
||||
consider the following text:
|
||||
</p>
|
||||
<pre class="example">
|
||||
t = p:match[[
|
||||
first line
|
||||
subline 1
|
||||
subline 2
|
||||
second line
|
||||
third line
|
||||
subline 3.1
|
||||
subline 3.1.1
|
||||
subline 3.2
|
||||
]]
|
||||
</pre>
|
||||
<p>
|
||||
The resulting table <code>t</code> will be like this:
|
||||
</p>
|
||||
<pre class="example">
|
||||
{'first line'; {'subline 1'; 'subline 2'; ident = ' '};
|
||||
'second line';
|
||||
'third line'; { 'subline 3.1'; {'subline 3.1.1'; ident = ' '};
|
||||
'subline 3.2'; ident = ' '};
|
||||
ident = ''}
|
||||
</pre>
|
||||
|
||||
<h3>Macro expander</h3>
|
||||
<p>
|
||||
This example implements a simple macro expander.
|
||||
Macros must be defined as part of the pattern,
|
||||
following some simple rules:
|
||||
</p>
|
||||
<pre class="example">
|
||||
p = re.compile[[
|
||||
text <- {~ item* ~}
|
||||
item <- macro / [^()] / '(' item* ')'
|
||||
arg <- ' '* {~ (!',' item)* ~}
|
||||
args <- '(' arg (',' arg)* ')'
|
||||
-- now we define some macros
|
||||
macro <- ('apply' args) -> '%1(%2)'
|
||||
/ ('add' args) -> '%1 + %2'
|
||||
/ ('mul' args) -> '%1 * %2'
|
||||
]]
|
||||
|
||||
print(p:match"add(mul(a,b), apply(f,x))") --> a * b + f(x)
|
||||
</pre>
|
||||
<p>
|
||||
A <code>text</code> is a sequence of items,
|
||||
wherein we apply a substitution capture to expand any macros.
|
||||
An <code>item</code> is either a macro,
|
||||
any character different from parentheses,
|
||||
or a parenthesized expression.
|
||||
A macro argument (<code>arg</code>) is a sequence
|
||||
of items different from a comma.
|
||||
(Note that a comma may appear inside an item,
|
||||
e.g., inside a parenthesized expression.)
|
||||
Again we do a substitution capture to expand any macro
|
||||
in the argument before expanding the outer macro.
|
||||
<code>args</code> is a list of arguments separated by commas.
|
||||
Finally we define the macros.
|
||||
Each macro is a string substitution;
|
||||
it replaces the macro name and its arguments by its corresponding string,
|
||||
with each <code>%</code><em>n</em> replaced by the <em>n</em>-th argument.
|
||||
</p>
|
||||
|
||||
<h3>Patterns</h3>
|
||||
<p>
|
||||
This example shows the complete syntax
|
||||
of patterns accepted by <code>re</code>.
|
||||
</p>
|
||||
<pre class="example">
|
||||
p = [=[
|
||||
|
||||
pattern <- exp !.
|
||||
exp <- S (alternative / grammar)
|
||||
|
||||
alternative <- seq ('/' S seq)*
|
||||
seq <- prefix*
|
||||
prefix <- '&' S prefix / '!' S prefix / suffix
|
||||
suffix <- primary S (([+*?]
|
||||
/ '^' [+-]? num
|
||||
/ '->' S (string / '{}' / name)
|
||||
/ '=>' S name) S)*
|
||||
|
||||
primary <- '(' exp ')' / string / class / defined
|
||||
/ '{:' (name ':')? exp ':}'
|
||||
/ '=' name
|
||||
/ '{}'
|
||||
/ '{~' exp '~}'
|
||||
/ '{' exp '}'
|
||||
/ '.'
|
||||
/ name S !arrow
|
||||
/ '<' name '>' -- old-style non terminals
|
||||
|
||||
grammar <- definition+
|
||||
definition <- name S arrow exp
|
||||
|
||||
class <- '[' '^'? item (!']' item)* ']'
|
||||
item <- defined / range / .
|
||||
range <- . '-' [^]]
|
||||
|
||||
S <- (%s / '--' [^%nl]*)* -- spaces and comments
|
||||
name <- [A-Za-z][A-Za-z0-9_]*
|
||||
arrow <- '<-'
|
||||
num <- [0-9]+
|
||||
string <- '"' [^"]* '"' / "'" [^']* "'"
|
||||
defined <- '%' name
|
||||
|
||||
]=]
|
||||
|
||||
print(re.match(p, p)) -- a self description must match itself
|
||||
</pre>
|
||||
|
||||
|
||||
|
||||
<h2><a name="license">License</a></h2>
|
||||
|
||||
<p>
|
||||
Copyright © 2008-2010 Lua.org, PUC-Rio.
|
||||
</p>
|
||||
<p>
|
||||
Permission is hereby granted, free of charge,
|
||||
to any person obtaining a copy of this software and
|
||||
associated documentation files (the "Software"),
|
||||
to deal in the Software without restriction,
|
||||
including without limitation the rights to use,
|
||||
copy, modify, merge, publish, distribute, sublicense,
|
||||
and/or sell copies of the Software,
|
||||
and to permit persons to whom the Software is
|
||||
furnished to do so,
|
||||
subject to the following conditions:
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The above copyright notice and this permission notice
|
||||
shall be included in all copies or substantial portions of the Software.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
EXPRESS OR IMPLIED,
|
||||
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
||||
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
|
||||
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
||||
TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||
THE SOFTWARE.
|
||||
</p>
|
||||
|
||||
</div> <!-- id="content" -->
|
||||
|
||||
</div> <!-- id="main" -->
|
||||
|
||||
<div id="about">
|
||||
<p><small>
|
||||
$Id: re.html,v 1.15 2010/11/05 12:53:43 roberto Exp $
|
||||
</small></p>
|
||||
</div> <!-- id="about" -->
|
||||
|
||||
</div> <!-- id="container" -->
|
||||
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user