on the Curl Web Content Markup Language

on the Curl Web Content Markup and Programming Language from www.curl.com and www.curlap.com
Showing posts with label light-weight markup. Show all posts
Showing posts with label light-weight markup. Show all posts

Monday, June 4, 2012

serialization of an off-line text resource


To see the effect of my Curl serialization effort look at the speed of the small daily pages such as http://www.aule-browser.com/kanji/poets/basho-mee-indexed.html

The new daily page generated fast and it loads very fast. I have already cycled twice through the BIN serialization files as I find minor transcription errors in the principal text resource - but that text resource now does not need to reside on the applet web server !

Here is a snapshot for





Monday, May 28, 2012

Curl data versus JSON data (light-weight markup)

Here is the top of a validated JSON katakana data file for Japanese e-learning:

{"katakana": [
{"ucs": "30AB", "utf-8": "E382AB", "kana": "カ", "info": "katakana letter KA"},
{"ucs": "30AC", "utf-8": "E382AC", "kana": "ガ", "info": "katakana letter GA"},
{"ucs": "30AD", "utf-8": "E382AD", "kana": "キ", "info": "katakana letter KI"},
{"ucs": "30AE", "utf-8": "E382AE", "kana": "ギ", "info": "katakana letter GI"},

and here is the Curl:

{let katakana-array:{Array-of Katakana} = {new {Array-of Katakana},
{Katakana "30AB", "E382AB", "カ", "katakana letter KA"},
{Katakana "30AC", "E382AC", "ガ", "katakana
letter GA"},
{Katakana "30AD", "E382AD", "キ", "katakana
letter KI"},
{Katakana "30AE", "E382AE", "ギ", "katakana
letter GI"},

In Curl, both require field definitions for processing - except that the Curl data requres a minimal class definition and a default constructor declaring all fields as being assigned (simple value class).

Of course both could have been reduced to mere arrays of strings, but then the iteration over the data would use no tags or keys.  The Curl version is tagged, but internally:

{define-value-class public final Katakana
  field private constant ucs-code:String || = "0000"
  field private constant utf8-code:String || = "000000"
  field private constant kana-char:String || = {String '\u5B57'} || "字"
  field private constant kana-name:String || = "Ji"
  {getter public {ucs}:String
    {return self.ucs-code}
  }
  {getter public {utf-8}:String
    {return self.utf8-code}
  }
  {getter public {katakana}:String
    {return self.kana-char}
  }
  {getter public {character}:String
    {return self.kana-name}
  }
  {constructor {default ucs:String, utf:String, kana:String, info:String}
    set self.ucs-code = ucs
    set self.utf8-code = utf
    set self.kana-char = kana
    set self.kana-name = info
  }
 }
{include "./katakana-unicode.scurl"}

The iterator block accesses each instance as, e.g., val.ucs and so forth.

In fairness, the JSON could have been

{"katakana-array": [
{"katakana": ["ucs": "30AB", "utf-8": "E382AB", "kana": "カ", "info": "katakana letter KA"]},

but it further complicated iterating over the data.

In the Curl applet the Curl data is processed dramatically faster, naturally.

The JSON data can be used anywhere, e.g., by Pharo Smalltalk or jQuery in web page widgets.

Note: the UTF-8 can be used to urlencode:  "E382B6" becomes %E3%82%B6 for a URL.

Here is one result using the Curl data: (click to view)

 

The applet is located at www.aule-browser.com/kanji/kana-charts.html



Saturday, January 21, 2012

netstring markdown md2curl

1)  add a Curl class for parsing netstring such as
14:4:this,1: ,9:netstring,,
with docs note on handling true length of utf-8 strings in light-weight markup.  A PEG parser ?

2)  md2curl is available as name for markdown-to-Curl parser.  See Java PEG parser.  Q: what advantages to having Traits for parsers ?

note: true-length of presented string varies also with Unicode composing marks such as accents for presenting Russian as text in e-learning applications.  Given these issues, a parser in Icon, UNICON or Object Icon seems appealing as an alternative to PROLOG. alternative: compiled Red (Rebol-like).

ironic, in a way, that we move to 64-bit while shy of UTF-16 as we get terabyte DASD and higher-speed networks: 16-bit text could have been planned as the markdown delight (we might even have been spared XML.)

I have a link to this post over at the global Curl community.