Martin Möhrmann [Tue, 5 Jun 2018 06:14:57 +0000 (08:14 +0200)]
runtime: replace sys.CacheLineSize by corresponding internal/cpu const and vars
sys here is runtime/internal/sys.
Replace uses of sys.CacheLineSize for padding by
cpu.CacheLinePad or cpu.CacheLinePadSize.
Replace other uses of sys.CacheLineSize by cpu.CacheLineSize.
Remove now unused sys.CacheLineSize.
Updates #25203
Change-Id: I1daf410fe8f6c0493471c2ceccb9ca0a5a75ed8f
Reviewed-on: https://go-review.googlesource.com/126601
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Daniel Martí [Mon, 20 Aug 2018 20:31:49 +0000 (21:31 +0100)]
cmd/compile: cleanup walking OCONV/OCONVNOP
Use a separate func, which needs less indentation and can use returns
instead of labelled breaks. We can also give the types better names, and
we don't have to repeat the calls to conv and mkcall.
Passes toolstash -cmp on std cmd.
Change-Id: I1071c170fa729562d70093a09b7dea003c5fe26e
Reviewed-on: https://go-review.googlesource.com/130075
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
Martin Möhrmann [Fri, 24 Aug 2018 15:07:20 +0000 (17:07 +0200)]
internal/cpu: add a CacheLinePadSize constant
The new constant CacheLinePadSize can be used to compute best effort
alignment of structs to cache lines.
e.g. the runtime can use this in the locktab definition:
var locktab [57]struct {
l spinlock
pad [cpu.CacheLinePadSize - unsafe.Sizeof(spinlock{})]byte
}
Change-Id: I86f6fbfc5ee7436f742776a7d4a99a1d54ffccc8
Reviewed-on: https://go-review.googlesource.com/131237 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Alberto Donizetti [Wed, 22 Aug 2018 08:58:24 +0000 (10:58 +0200)]
time: allow +00 as numeric timezone name and GMT offset
A timezone with a zero offset from UTC and without a three-letter
abbreviation will have a numeric name in timestamps: "+00".
There are currently two of them:
$ zdump Atlantic/Azores America/Scoresbysund
Atlantic/Azores Wed Aug 22 09:01:05 2018 +00
America/Scoresbysund Wed Aug 22 09:01:05 2018 +00
These two timestamp are rejected by Parse, since it doesn't allow for
zero offsets:
parsing time "Wed Aug 22 09:01:05 2018 +00": extra text: +00
This change modifies Parse to accept a +00 offset in numeric timezone
names.
As side effect of this change, Parse also now accepts "GMT+00". It was
explicitely disallowed (with a unit test ensuring it got rejected),
but the restriction seems incorrect.
DATE(1), for example, allows it:
$ date --debug --date="2009-01-02 03:04:05 GMT+00"
date: parsed date part: (Y-M-D) 2009-01-02
date: parsed time part: 03:04:05
date: parsed zone part: UTC+00
date: input timezone: parsed date/time string (+00)
date: using specified time as starting value: '03:04:05'
date: starting date/time: '(Y-M-D) 2009-01-02 03:04:05 TZ=+00'
date: '(Y-M-D) 2009-01-02 03:04:05 TZ=+00' = 1230865445 epoch-seconds
date: timezone: system default
date: final: 1230865445.000000000 (epoch-seconds)
date: final: (Y-M-D) 2009-01-02 03:04:05 (UTC)
date: final: (Y-M-D) 2009-01-02 04:04:05 (UTC+01)
Fri 2 Jan 04:04:05 CET 2009
This fixes 2 of 17 time.Parse() failures listed in Issue #26032.
Updates #26032
Change-Id: I01cd067044371322b7bb1dae452fb3c758ed3cc2
Reviewed-on: https://go-review.googlesource.com/130696
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Martin Möhrmann [Sun, 3 Jun 2018 11:00:19 +0000 (13:00 +0200)]
runtime: do not execute write barrier on newly allocated slice in growslice
The new slice created in growslice is cleared during malloc for
element types containing pointers and therefore can only contain
nil pointers. This change avoids executing write barriers for these
nil pointers by adding and using a special bulkBarrierPreWriteSrcOnly
function that does not enqueue pointers to slots in dst to the write
barrier buffer.
Change-Id: If9b18248bfeeb6a874b0132d19520adea593bfc4
Reviewed-on: https://go-review.googlesource.com/115996
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Martin Möhrmann [Fri, 1 Jun 2018 12:30:49 +0000 (14:30 +0200)]
runtime: replace typedmemmmove with bulkBarrierPreWrite and memmove in growslice
A bulkBarrierPreWrite together with a memmove as used in typedslicecopy
is faster than a typedmemmove for each element of the old slice that
needs to be copied to the new slice.
typedslicecopy is not used here as runtime functions should not call
other instrumented runtime functions and some conditions like dst == src
or the destination slice not being large enought that are checked for
in typedslicecopy can not happen in growslice.
Martin Möhrmann [Sat, 28 Jul 2018 08:56:48 +0000 (10:56 +0200)]
runtime: use internal/cpu variables in assembler code
Using internal/cpu variables has the benefit of avoiding false sharing
(as those are padded) and allows memory and cache usage for these variables
to be shared by multiple packages.
Change-Id: I2bf68d03091bf52b466cf689230d5d25d5950037
Reviewed-on: https://go-review.googlesource.com/126599
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Martin Möhrmann [Thu, 10 May 2018 07:28:04 +0000 (09:28 +0200)]
cmd/compile: refactor appendslice to use newer gc code style
- add comments with builtin function signatures that are instantiated
- use Nodes type from the beginning instead of
[]*Node with a later conversion to Nodes
- use conv(x, y) helper function instead of nod(OCONV, x, y)
- factor out repeated calls to Type.Elem()
This makes the function style similar to newer functions like extendslice.
Ben Shi [Mon, 23 Jul 2018 03:16:53 +0000 (03:16 +0000)]
cmd/internal/obj: support more arm64 FP instructions
ARM64 also supports float point LDP(load pair) & STP (store pair).
The CL adds implementation and corresponding test cases for
FLDPD/FLDPS/FSTPD/FSTPS.
Change-Id: I45f112012a4e097bfaf023d029b36e6cbc7a5859
Reviewed-on: https://go-review.googlesource.com/125438
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Dmitri Shuralyov [Thu, 23 Aug 2018 19:05:19 +0000 (15:05 -0400)]
doc/go1.11: add link to new WebAssembly wiki page
The wiki page has recently been created, and at this time it's
just a stub. It's expected that support for WebAssembly will be
evolving over time, and the wiki page can be kept updated with
helpful information, how to get started, tips and tricks, etc.
Use present tense because it's expected that there will be more
general information added by the time Go 1.11 release happens.
Also add link to https://webassembly.org/ in first paragraph.
Change-Id: I139c2dcec8f0d7fd89401df38a3e12960946693f
Reviewed-on: https://go-review.googlesource.com/131078 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Heschi Kreinick [Fri, 17 Aug 2018 20:32:08 +0000 (16:32 -0400)]
runtime: handle morestack system stack transition in gentraceback
gentraceback handles system stack transitions, but only when they're
done by systemstack(). Handle morestack() too.
I tried to do this generically but systemstack and morestack are
actually *very* different functions. Most notably, systemstack returns
"normally", just messes with $sp along the way. morestack never
returns at all -- it calls newstack, and newstack then jumps both
stacks and functions back to whoever called morestack. I couldn't
think of a way to handle both of them generically. So don't.
The current implementation does not include systemstack on the generated
traceback. That's partly because I don't know how to find its stack frame
reliably, and partly because the current structure of the code wants to
do the transition before the call, not after. If we're willing to
assume that morestack's stack frame is 0 size, this could probably be
fixed.
For posterity's sake, alternatives tried:
- Have morestack put a dummy function into schedbuf, like systemstack
does. This actually worked (see patchset 1) but more by a series of
coincidences than by deliberate design. The biggest coincidence was that
because morestack_switch was a RET and its stack was 0 size, it actually
worked to jump back to it at the end of newstack -- it would just return
to the caller of morestack. Way too subtle for me, and also a little
slower than just jumping directly.
- Put morestack's PC and SP into schedbuf, so that gentraceback could
treat it like a normal function except for the changing SP. This was a
terrible idea and caused newstack to reenter morestack in a completely
unreasonable state.
To make testing possible I did a small redesign of testCPUProfile to
take a callback that defines how to check if the conditions pass to it
are satisfied. This seemed better than making the syntax of the "need"
strings even more complicated.
Heschi Kreinick [Fri, 17 Aug 2018 20:32:02 +0000 (16:32 -0400)]
runtime: fix use of wrong g in gentraceback
gentraceback gets the currently running g to do some sanity checks, but
should use gp everywhere to do its actual work. Some noncritical checks
later accidentally used g instead of gp. This seems like it could be a
problem in many different contexts, but I noticed in Windows profiling,
where profilem calls gentraceback on a goroutine from a different
thread.
Daniel Martí [Fri, 1 Jun 2018 11:17:41 +0000 (12:17 +0100)]
cmd/vet: check embedded field tags too
We can no longer use the field's position for the duplicate field tag
warning - since we now check embedded tags, the positions may belong to
copmletely different packages.
Instead, keep track of the lowest field that's still part of the
top-level struct type that we are checking.
Finally, be careful to not repeat the independent struct field warnings
when checking fields again because they are embedded into another
struct. To do this, separate the duplicate tag value logic into a func
that recurses into embedded fields on a per-encoding basis.
Fixes #25593.
Change-Id: I3bd6e01306d8ec63c0314d25e3136d5e067a9517
Reviewed-on: https://go-review.googlesource.com/115677
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>
cmd/compile: add sources for inlined functions to ssa.html
This CL adds the source code of all inlined functions
into the function specified in $GOSSAFUNC.
The code is appended to the sources column of ssa.html.
ssaDumpInlined is populated with references to inlined functions.
Then it is used for dumping the sources in buildssa.
The source columns contains code in following order:
target function, inlined functions sorted by filename, lineno.
This CL exports the Func.Endlineno value for inlineable functions.
It is needed to grab the source code of an imported function
inlined into the function specified in $GOSSAFUNC.
Since we print almost everything to ssa.html in the GOSSAFUNC mode,
there is a need to stop spamming stdout when user just wants to see
ssa.html.
This changes cleans output of the GOSSAFUNC debug mode.
To enable the dump of the debug data to stdout, one must
put suffix + after the function name like that:
GOSSAFUNC=Foo+
Otherwise gc will not print the IR and ASM to stdout after each phase.
AST IR is still sent to stdout because it is not included
into ssa.html. It will be fixed in a separate change.
The change adds printing out the full path to the ssa.html file.
Oryan Moshe [Fri, 3 Aug 2018 11:52:39 +0000 (14:52 +0300)]
cmd/cgo: pass explicit -O0 to the compiler
The current implementation removes all of the optimization flags from
the compiler.
Added the -O0 optimization flag after the removal loop, so go can
compile cgo on every OS consistently.
Fixes #26487
Change-Id: Ia98bca90def186dfe10f50b1787c2f40d85533da
Reviewed-on: https://go-review.googlesource.com/127755
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Brian Kessler [Tue, 19 Sep 2017 05:02:55 +0000 (22:02 -0700)]
math/big: streamline divLarge initialization
The divLarge code contained "todo"s about avoiding alias
and clear calls in the initialization of variables. By
rearranging the order of initialization and always using
an auxiliary variable for the shifted divisor, all of these
calls can be safely avoided. On average, normalizing
the divisor (shift>0) is required 31/32 or 63/64 of the
time. If one always performs the shift into an auxiliary
variable first, this avoids the need to check for aliasing of
vIn in the output variables u and z. The remainder u is
initialized via a left shift of uIn and thus needs no
alias check against uIn. Since uIn and vIn were both used,
z needs no alias checks except against u which is used for
storage of the remainder. This change has a minimal impact
on performance (see below), but cleans up the initialization
code and eliminates the "todo"s.
Emmanuel T Odeke [Wed, 22 Aug 2018 05:58:08 +0000 (22:58 -0700)]
internal/poll: use F_FULLFSYNC fcntl for FD.Fsync on OS X
As reported in #26650 and also cautioned on the man page
for fsync on OS X, fsync doesn't properly flush content
to permanent storage, and might cause corruption of data if
the OS crashes or if the drive loses power. Thus it is recommended
to use the F_FULLFSYNC fcntl, which flushes all buffered data to
permanent storage and is important for applications such as
databases that require a strict ordering of writes.
Also added a note in syscall_darwin.go that syscall.Fsync is
not invoked for os.File.Sync.
Fixes #26650.
Change-Id: Idecd9adbbdd640b9c5b02e73b60ed254c98b48b6
Reviewed-on: https://go-review.googlesource.com/130676
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
If two goroutines are racing on a map, one of them will exit
cleanly, clearing the hashWriting bit, and the other will
likely notice and panic. If we use XOR instead of OR to
set the bit in the first place, even numbers of racers will
hopefully all see the bit cleared and panic simultaneously,
giving the full set of available stacks. If a third racer
sneaks in, we are no worse than the current code, and
the generated code should be no more expensive.
In practice, this catches most racing goroutines even in
very tight races. See the demonstration program posted
on https://github.com/golang/go/issues/26703 for an example.
Martin Möhrmann [Sun, 19 Aug 2018 05:50:39 +0000 (07:50 +0200)]
fmt: print values for map keys with non-reflexive equality
Previously fmt would first obtain a list of map keys
and then look up the value for each key. Since NaNs can
be map keys but cannot be fetched directly, the lookup would
fail and return a zero reflect.Value, which formats as <nil>.
golang.org/cl/33572 added a map iterator to the reflect package
that is used in this CL to retrieve the key and value from
the map and prints the correct value even for keys that are not
equal to themselves.
Fixes #14427
Change-Id: I9e1522959760b3de8b7ecf7a6e67cd603339632a
Reviewed-on: https://go-review.googlesource.com/129777 Reviewed-by: Alan Donovan <adonovan@google.com>
cmd/compile: cache the value of environment variable GOSSAFUNC
Store the value of GOSSAFUNC in a global variable to avoid
multiple calls to os.Getenv from gc.buildssa and gc.mkinlcall1.
The latter is implemented in the CL 126606.
Brian Kessler [Tue, 8 May 2018 06:07:13 +0000 (00:07 -0600)]
math/big: optimize multiplication by 2 and 1/2 in float Sqrt
The Sqrt code previously used explicit constants for 2 and 1/2. This change
replaces multiplication by these constants with increment and decrement of
the floating point exponent directly. This improves performance by ~7-10%
for small inputs and minimal improvement for large inputs.
Martin Möhrmann [Wed, 22 Aug 2018 20:34:39 +0000 (22:34 +0200)]
runtime: skip TestGcSys on Windows
This is causing failures on TryBots and BuildBots:
--- FAIL: TestGcSys (0.06s)
gc_test.go:27: expected "OK\n", but got "using too much memory: 39882752 bytes\n"
FAIL
Yury Smolsky [Thu, 14 Jun 2018 15:20:03 +0000 (18:20 +0300)]
cmd/compile: display Go code for a function in ssa.html
This CL adds the "sources" column at the beginning of SSA table.
This column displays the source code for the function being passed
in the GOSSAFUNC env variable.
Also UI was extended so that clicking on particular line will
highlight all places this line is referenced.
JS code was cleaned and formatted.
This CL does not handle inlined functions. See issue 25904.
Ian Lance Taylor [Sat, 7 Jul 2018 05:57:35 +0000 (22:57 -0700)]
runtime: make TestGcSys actually test something
The workthegc function was being inlined, and the slice did not
escape, so there was no memory allocation. Use a sink variable to
force memory allocation, at least for now.
Fixes #23343
Change-Id: I02f4618e343c8b6cb552cb4e9f272e112785f7cf
Reviewed-on: https://go-review.googlesource.com/122576
Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Daniel Martí [Fri, 18 May 2018 17:31:05 +0000 (18:31 +0100)]
encoding/base64: slight decoding speed-up
First, use a dummy slice access on decode64 and decode32 to ensure that
there is a single bounds check for src.
Second, move the PutUint64/PutUint32 calls out of these functions,
meaning that they are simpler and smaller. This may also open the door
to inlineability in the future, but for now, they both go past the
budget.
While at it, get rid of the ilen and olen variables, which have no
impact whatsoever on performance. At least, not measurable by any of the
benchmarks.
Change-Id: I0dfbdafa2a41dc4c582f63aef94b90b8e473731c
Reviewed-on: https://go-review.googlesource.com/113776 Reviewed-by: Ian Lance Taylor <iant@golang.org>
Ian Lance Taylor [Tue, 21 Aug 2018 14:58:10 +0000 (07:58 -0700)]
regexp/syntax: don't do both linear and binary sesarch in MatchRunePos
MatchRunePos is a significant element of regexp performance, so some
attention to optimization is appropriate. Before this CL, a
non-matching rune would do both a linear search in the first four
entries, and a binary search over all the entries. Change the code to
optimize for the common case of two runes, to only do a linear search
when there are up to four entries, and to only do a binary search when
there are more than four entries.
andrius4669 [Thu, 17 May 2018 14:43:30 +0000 (14:43 +0000)]
bufio: avoid rescanning buffer multiple times in ReadSlice
When existing data in buffer does not have delimiter,
and new data is added with b.fill(), continue search from
previous point instead of starting from beginning.
Daniel Martí [Sat, 7 Jul 2018 18:07:14 +0000 (19:07 +0100)]
encoding/json: simplify some pieces of the encoder
Some WriteByte('\\') calls can be deduplicated.
fillField is used in two occasions, but it is unnecessary when adding
fields to the "next" stack, as those aren't used for the final encoding.
Inline the func with its only remaining call.
Finally, unindent a default-if block.
The performance of the encoder is unaffected:
name old time/op new time/op delta
CodeEncoder-4 6.65ms ± 1% 6.65ms ± 0% ~ (p=0.662 n=6+5)
Daniel Martí [Sun, 22 Jul 2018 11:36:15 +0000 (12:36 +0100)]
encoding/json: inline fieldByIndex
This function was only used in a single place - in the field encoding
loop within the struct encoder.
Inlining the function call manually lets us get rid of the call
overhead. But most importantly, it lets us simplify the logic afterward.
We no longer need to use reflect.Value{} and !fv.IsValid(), as we can
skip the field immediately.
The two factors combined (mostly just the latter) give a moderate speed
improvement to this hot loop.
name old time/op new time/op delta
CodeEncoder-4 6.01ms ± 1% 5.91ms ± 1% -1.66% (p=0.002 n=6+6)
name old speed new speed delta
CodeEncoder-4 323MB/s ± 1% 328MB/s ± 1% +1.69% (p=0.002 n=6+6)
Daniel Martí [Sun, 22 Jul 2018 11:26:38 +0000 (12:26 +0100)]
encoding/json: simplify the structEncoder type
structEncoder had two slices - the list of fields, and a list containing
the encoder for each field. structEncoder.encode then looped over the
fields, and indexed into the second slice to grab the field encoder.
However, this makes it very hard for the compiler to be able to prove
that the two slices always have the same length, and that the index
expression doesn't need a bounds check.
Merge the two slices into one to completely remove the need for bounds
checks in the hot loop.
While at it, don't copy the field elements when ranging, which greatly
speeds up the hot loop in structEncoder.
name old time/op new time/op delta
CodeEncoder-4 6.18ms ± 0% 5.56ms ± 0% -10.08% (p=0.002 n=6+6)
name old speed new speed delta
CodeEncoder-4 314MB/s ± 0% 349MB/s ± 0% +11.21% (p=0.002 n=6+6)
name old alloc/op new alloc/op delta
CodeEncoder-4 93.2kB ± 0% 62.1kB ± 0% -33.33% (p=0.002 n=6+6)
Filippo Valsorda [Tue, 21 Aug 2018 20:50:04 +0000 (14:50 -0600)]
crypto/tls: make ConnectionState.ExportKeyingMaterial a method
The unexported field is hidden from reflect based marshalers, which
would break otherwise. Also, make it return an error, as there are
multiple reasons it might fail.
Alberto Donizetti [Sun, 24 Jun 2018 11:42:59 +0000 (13:42 +0200)]
time: accept anything between -23 and 23 as offset namezone name
time.Parse currently rejects numeric timezones names with UTC offsets
bigger than +12, but this is incorrect: there's a +13 timezone and a
+14 timezone:
$ zdump Pacific/Kiritimati
Pacific/Kiritimati Mon Jun 25 02:15:03 2018 +14
For convenience, this cl changes the ranges of accepted offsets from
-14..+12 to -23..+23 (zero still excluded), i.e. every possible offset
that makes sense. We don't validate three-letter abbreviations for the
timezones names, so there's no need to be too strict on numeric names.
This change also fixes a bug in the parseTimeZone, that is currently
unconditionally returning true (i.e. valid timezone), without checking
the value returned by parseSignedOffset.
This fixes 5 of 17 time.Parse() failures listed in Issue #26032.
Updates #26032
Change-Id: I2f08ca9aa41ea4c6149ed35ed2dd8f23eeb42bff
Reviewed-on: https://go-review.googlesource.com/120558 Reviewed-by: Rob Pike <r@golang.org>
Jordan Rhee [Tue, 24 Jul 2018 22:13:41 +0000 (15:13 -0700)]
cmd/link: support windows/arm
Enable the Go linker to generate executables for windows/arm.
Generates PE relocation tables, which are used by Windows to
dynamically relocate the Go binary in memory. Windows on ARM
requires all modules to be relocatable, unlike x86/amd64 which are
permitted to have fixed base addresses.
Updates #26148
Change-Id: Ie63964ff52c2377e121b2885e9d05ec3ed8dc1cd
Reviewed-on: https://go-review.googlesource.com/125648
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Ilya Tocar [Tue, 21 Aug 2018 19:50:48 +0000 (14:50 -0500)]
net: use internal/bytealg insetad of linkname tricks
We are currently using go:linkname for some algorithms from
strings/bytes packages, to avoid importing strings/bytes.
But strings/bytes are just wrappers around internal/bytealg, so
we should use internal/bytealg directly.
Ilya Tocar [Tue, 21 Aug 2018 19:40:01 +0000 (14:40 -0500)]
time: optimize big4
Use the same load order in big4 as in encoding/binary.BigEndian.
This order is recognized by the compiler and converted into single load.
This isn't in the hot path, but doesn't hurt readability, so lets do this.
Andreas Auernhammer [Tue, 21 Aug 2018 14:12:36 +0000 (16:12 +0200)]
crypto/rc4: remove assembler implementations
This CL removes the RC4 assembler implementations.
RC4 is broken and should not be used for encryption
anymore. Therefore it's not worth maintaining
platform-specific assembler implementations.
The native Go implementation may be slower
or faster depending on the CPU:
vendor_id : GenuineIntel
cpu family : 6
model : 142
model name : Intel(R) Core(TM) i5-7Y54 CPU @ 1.20GHz
stepping : 9
microcode : 0x84
cpu MHz : 800.036
cache size : 4096 KB
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU @ 2.30GHz
stepping : 0
microcode : 0x1
cpu MHz : 2300.000
cache size : 46080 KB
Daniel Martí [Sun, 8 Jul 2018 15:14:35 +0000 (16:14 +0100)]
encoding/json: various minor decoder speed-ups
Reuse v.Type() and cachedTypeFields(t) when decoding maps and structs.
Always use the same data slices when in hot loops, to ensure that the
compiler generates good code. "for i < len(data) { use(d.data[i]) }"
makes it harder for the compiler.
Finally, do other minor clean-ups, such as deduplicating switch cases,
and using a switch instead of three chained ifs.
The decoder sees a noticeable speed-up, in particular when decoding
structs.
name old time/op new time/op delta
CodeDecoder-4 29.8ms ± 1% 27.5ms ± 0% -7.83% (p=0.002 n=6+6)
name old speed new speed delta
CodeDecoder-4 65.0MB/s ± 1% 70.6MB/s ± 0% +8.49% (p=0.002 n=6+6)
Michael Munday [Thu, 31 May 2018 12:06:27 +0000 (13:06 +0100)]
crypto/{aes,cipher,rand}: use binary.{Big,Little}Endian methods
Use the binary.{Big,Little}Endian integer encoding methods rather
than unsafe or local implementations. These methods are tested to
ensure they inline correctly and don't add unnecessary bounds checks,
so it seems better to use them wherever possible.
This introduces a dependency on encoding/binary to crypto/cipher. I
think this is OK because other "L3" packages already import
encoding/binary.
Alberto Donizetti [Tue, 21 Aug 2018 12:02:56 +0000 (14:02 +0200)]
cmd/gofmt: skip gofmt idempotency check on known issue
gofmt's TestAll runs gofmt on all the go files in the tree and checks,
among other things, that gofmt is idempotent (i.e. that a second
invocation does not change the input again).
There's a known bug of gofmt not being idempotent (Issue #24472), and
unfortunately the fixedbugs/issue22662.go file triggers it. We can't
just gofmt the file, because it tests the effect of various line
directives inside weirdly-placed comments, and gofmt moves those
comments, making the test useless.
Instead, just skip the idempotency check when gofmt-ing the
problematic file.
This fixes go test on the cmd/gofmt package, and a failure seen on the
longtest builder.
Updates #24472
Change-Id: Ib06300977cd8fce6c609e688b222e9b2186f5aa7
Reviewed-on: https://go-review.googlesource.com/130377 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Ian Lance Taylor [Thu, 9 Aug 2018 00:26:23 +0000 (17:26 -0700)]
runtime: don't use linkname to refer to internal/cpu
The runtime package already imports the internal/cpu package, so there
is no reason for it to use go:linkname comments to refer to
internal/cpu functions and variables. Since internal/cpu is internal,
we can just export those names. Removing the obscurity of go:linkname
outweighs the minor additional complexity added to the internal/cpu API.
Change-Id: Id89951b7f3fc67cd9bce67ac6d01d44a647a10ad
Reviewed-on: https://go-review.googlesource.com/128755
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Tobias Klauser [Tue, 21 Aug 2018 08:48:00 +0000 (10:48 +0200)]
syscall: add S_IRWXG and S_IRWXO on Solaris
As discussed in CL 126621, these constants are already defined on Linux,
Darwin, FreeBSD and NetBSD. In order to ensure portability of existing
code using the syscall package, provide them for Solaris as well.
Tobias Klauser [Tue, 21 Aug 2018 06:31:38 +0000 (08:31 +0200)]
syscall: add S_IRWXG and S_IRWXO on OpenBSD
As discussed in CL 126621, these constants are already defined on Linux,
Darwin, FreeBSD and NetBSD. In order to ensure portability of existing
code using the syscall package, provide them for OpenBSD (and
DragonflyBSD, in a separate CL) as well.
Tobias Klauser [Tue, 21 Aug 2018 06:28:16 +0000 (08:28 +0200)]
syscall: add S_IRWXG and S_IRWXO on DragonflyBSD
As discussed in CL 126621, these constants are already defined on Linux,
Darwin, FreeBSD and NetBSD. In order to ensure portability of existing
code using the syscall package, provide them for DragonflyBSD (and
OpenBSD, in a separate CL) as well.
Daniel Martí [Sat, 7 Jul 2018 20:40:28 +0000 (21:40 +0100)]
encoding/json: remove alloc when encoding short byte slices
If the encoded bytes fit in the bootstrap array encodeState.scratch, use
that instead of allocating a new byte slice.
Also tweaked the Encoding vs Encoder heuristic to use the length of the
encoded bytes, not the length of the input bytes. Encoding is used for
allocations of up to 1024 bytes, as we measured 2048 to be the point
where it no longer provides a noticeable advantage.
Also added some benchmarks. Only the first case changes in behavior.
Daniel Martí [Sat, 7 Jul 2018 14:59:20 +0000 (15:59 +0100)]
encoding/json: encode struct field names ahead of time
Struct field names are static, so we can run HTMLEscape on them when
building each struct type encoder. Then, when running the struct
encoder, we can select either the original or the escaped field name to
write directly.
When the encoder is not escaping HTML, using the original string works
because neither Go struct field names nor JSON tags allow any characters
that would need to be escaped, like '"', '\\', or '\n'.
When the encoder is escaping HTML, the only difference is that '<', '>',
and '&' are allowed via JSON struct field tags, hence why we use
HTMLEscape to properly escape them.
All of the above lets us encode field names with a simple if/else and
WriteString calls, which are considerably simpler and faster than
encoding an arbitrary string.
While at it, also include the quotes and colon in these strings, to
avoid three WriteByte calls in the loop hot path.
Also added a few tests, to ensure that the behavior in these edge cases
is not broken. The output of the tests is the same if this optimization
is reverted.
name old time/op new time/op delta
CodeEncoder-4 7.12ms ± 0% 6.14ms ± 0% -13.85% (p=0.004 n=6+5)
name old speed new speed delta
CodeEncoder-4 272MB/s ± 0% 316MB/s ± 0% +16.08% (p=0.004 n=6+5)
name old alloc/op new alloc/op delta
CodeEncoder-4 91.9kB ± 0% 93.2kB ± 0% +1.43% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeEncoder-4 0.00 0.00 ~ (all equal)