Mike Samuel [Fri, 9 Sep 2011 07:07:40 +0000 (00:07 -0700)]
exp/template/html: Grammar rules for HTML comments and special tags.
Augments type context and adds grammatical rules to handle special HTML constructs:
<!-- comments -->
<script>raw text</script>
<textarea>no tags here</textarea>
This CL does not elide comment content. I recommend we do that but
have not done it in this CL.
I used a codesearch tool over a codebase in another template language.
Based on the below I think we should definitely recognize
<script>, <style>, <textarea>, and <title>
as each of these appears frequently enough that there are few
template using apps that do not use most of them.
Of the other special tags,
<xmp>, <noscript>
are used but infrequently, and
<noframe> and friend, <listing>
do not appear at all.
We could support <xmp> even though it is obsolete in HTML5
because we already have the machinery, but I suggest we do not
support noscript since it is a normal tag in some browser
configurations.
I suggest recognizing and eliding <!-- comments -->
(but not escaping text spans) as they are widely used to
embed comments in template source. Not eliding them increases
the size of content sent over the network, and risks leaking
code and project internal details.
The template language I tested elides them so there are
no instance of IE conditional compilation directives in the
codebase but that could be a source of confusion.
The codesearch does the equivalent of
$ find . -name \*.file-extension \
| perl -ne 'print "\L$1\n" while s@<([a-z][a-z0-9])@@i' \
| sort | uniq -c | sort
The 5 uses of <plaintext> seem to be in tricky code and can be ignored.
The 2 uses of <xmp> appear in the same tricky code and can be ignored.
I also ignored end tags to avoid biasing against unary
elements and threw out some nonsense names since since the
long tail is dominated by uses of < as a comparison operator
in the template languages expression language.
I have added asterisks next to abnormal elements.
26765 div
7432 span
7414 td
4233 a
3730 tr
3238 input
2102 br
1756 li
1755 img
1674 table
1388 p
1311 th
1064 option
992 b
891 label
714 script *
519 ul
446 tbody
412 button
381 form
377 h2
358 select
353 strong
318 h3
314 body
303 html
266 link
262 textarea *
261 head
258 meta
225 title *
189 h1
176 col
156 style *
151 hr
119 iframe
103 h4
101 pre
100 dt
98 thead
90 dd
83 map
80 i
69 object
66 ol
65 em
60 param
60 font
57 fieldset
51 string
51 field
51 center
44 bidi
37 kbd
35 legend
30 nobr
29 dl
28 var
26 small
21 cite
21 base
20 embed
19 colgroup
12 u
12 canvas
10 sup
10 rect
10 optgroup
10 noscript *
9 wbr
9 blockquote
8 tfoot
8 code
8 caption
8 abbr
7 msg
6 tt
6 text
6 h5
5 svg
5 plaintext *
5 article
4 shortquote
4 number
4 menu
4 ins
3 progress
3 header
3 content
3 bool
3 audio
3 attribute
3 acronym
2 xmp *
2 overwrite
2 objects
2 nobreak
2 metadata
2 description
2 datasource
2 category
2 action
Paul Lalonde [Thu, 8 Sep 2011 06:32:40 +0000 (16:32 +1000)]
Windows: net, syscall: implement SetsockoptIPMReq(), move to winsock v2.2 for multicast support.
I don't know the protocol regarding the zsyscall files which appear to
be hand-generated, so I've re-done them and added them to the change.
The linker would catch them if gc succeeded,
but too often the cycle manifests as making the
current package and the imported copy of itself
appear as different packages, which result in
type signature mismatches that confuse users.
As a crutch, add the -p flag to say 'if you see an
import of this package, give up early'. Results in
messages like (during gotest in sort):
export_test.go:7: import "sort" while compiling that package (import cycle)
export_test.go:7: import "container/heap": package depends on "sort" (import cycle)
At this point I am very confident in exp/regexp's
behavior. It should be usable as a drop-in
replacement for regexp now.
Later CLs could introduce a CompilePOSIX
to get at traditional POSIX ``extended regular expressions''
as in egrep and also an re.MatchLongest method to
change the matching mode to leftmost longest
instead of leftmost first. On the other hand, I expect
very few people to use either.
Note that this CL will break your existing code which uses
ParseCIDR.
This CL changes ParseCIDR("172.16.253.121/28") to return
the IP address "172.16.253.121", the network implied by the
network number "172.16.253.112" and mask "255.255.255.240".
allocparams + tempname + compactframe
all knew about how to place stack variables.
Now only compactframe, renamed to allocauto,
does the work. Until the last minute, each PAUTO
variable is in its own space and has xoffset == 0.
This might break 5g. I get failures in concurrent
code running under qemu and I can't tell whether
it's 5g's fault or qemu's. We'll see what the real
ARM builders say.
There's some ambiguity in the U{url: url} case as it could be
both a map or a struct literal, but given context it's more
likely a struct, so U{url: url_} rather than U{url_: url_}.
At least that was the case for me.
Robert Griesemer [Fri, 2 Sep 2011 17:07:29 +0000 (10:07 -0700)]
godoc: minor tweaks for app-engine use
- read search index files in groutine to avoid
start-up failure on app engine because reading
the files takes too long
- permit usage of search index files and indexer
- minor cosmetic cleanups
exp/norm: added Reader and Writer and bug fixes to support these.
Needed to ensure that finding the last boundary does not result in O(n^2)-like behavior.
Now prevents lookbacks beyond 31 characters across the board (starter + 30 non-starters).
composition.go:
- maxCombiningCharacters now means exactly that.
- Bug fix.
- Small performance improvement/ made code consistent with other code.
forminfo.go:
- Bug fix: ccc needs to be 0 for inert runes.
normalize.go:
- A few bug fixes.
- Limit the amount of combining characters considered in FirstBoundary.
- Ditto for LastBoundary.
- Changed semantics of LastBoundary to not consider trailing illegal runes a boundary
as long as adding bytes might still make them legal.
trie.go:
- As utf8.UTFMax is 4, we should treat UTF-8 encodings of size 5 or greater as illegal.
This has no impact on the normalization process, but it prevents buffer overflows
where we expect at most UTFMax bytes.
Was keeping a pointer to the labeled statement in n->right,
which meant that generic traversals of the tree visited it twice.
That combined with aggressive flattening of the block
structure when possible during parsing meant that
the kinds of label: code label: code label: code sequences
generated by yacc were giving the recursion 2ⁿ paths
through the program.
Robert Griesemer [Wed, 31 Aug 2011 21:01:58 +0000 (14:01 -0700)]
godoc: more index size reduction
- KindRuns don't need to repeat SpotKind,
it is stored in each Spot
- removed extra indirection from FileRuns
to KindRuns
- slight reduction of written index size
(~500KB)
Robert Griesemer [Wed, 31 Aug 2011 01:47:15 +0000 (18:47 -0700)]
godoc index: first step towards reducing index size
- canonicalize package descriptors
- remove duplicate storage of file paths
- reduces (current) written index file by approx 3.5MB
(from 28434237B to 24686643B, or 13%)
- next step: untangle DAG (when serializing, using
gob, the index dag explodes into an index tree)
Jaroslavas Počepko [Tue, 30 Aug 2011 12:02:02 +0000 (22:02 +1000)]
runtime: windows/amd64 callbacks fixed and syscall fixed to allow using it in callbacks
Fixes #2178.
Patch2: Fixed allocating shadow space for stdcall (must be at least 32 bytes in any case)
Patch3: Made allocated chunk smaller.
Patch4: Typo
Patch5: suppress linktime warning "runtime.callbackasm: nosplit stack overflow"
Patch6: added testcase src/pkg/syscall/callback_windows_test.go
Patch7: weakly related files moved to https://golang.org/cl/4965050 https://golang.org/cl/4974041 https://golang.org/cl/4965051
Patch8: reflect changes https://golang.org/cl/4926042/
Patch9: reflect comments