Speed up A64 simulator by removing useless memcpy.

The addresses involved should always be aligned, so we can simply use
a cast, just like the ARM simulator. Even if the alignment assumption
did not hold and the platform we are running on couldn't handle
unaligned access, some #ifdefs would be much more preferable. The
affected member functions were the top 2 in a profile (18% and 15%),
so basically every hack is allowed here to speed things up. :-)

Removed some dead code for literals on the way. If we need to
resurrect it, we should do it without double(!) memcpys.

Generally, I still don't understand why we need the Instr/Instruction
distinction or simply wrap Instr within Instruction, this seems to
be much simpler and cleaner, but this would involve heavier changes.

The overall speedup of this CL is roughly 37%, see the numbers below
for a reduced Octane suite and the check targets:

------------------------------------------------------------
With memcpy:
------------------------------------------------------------

make -j32 a64.release.quickcheck => 03:29
make -j32 a64.release.check      => 11:30
Reduced Octane suite             => 05:16
Richards: 35.1
DeltaBlue: 64.1
RayTrace: 130
Splay: 66.1
SplayLatency: 619
NavierStokes: 58.7
PdfJS: 89.6
Mandreel: 58.5
MandreelLatency: 242
CodeLoad: 5103
Box2D: 124
----
Score (version 9): 144

------------------------------------------------------------
With casts:
------------------------------------------------------------
make -j32 a64.release.quickcheck => 02:14
make -j32 a64.release.check      => 07:21
Reduced Octane suite             => 03:21
Richards: 53.3
DeltaBlue: 103
RayTrace: 205
Splay: 95.9
SplayLatency: 859
NavierStokes: 103
PdfJS: 136
Mandreel: 94.8
MandreelLatency: 386
CodeLoad: 6493
Box2D: 179
----
Score (version 9): 219

R=ulan@chromium.org

Review URL: https://codereview.chromium.org/195873009

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@19929 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
parent 62592f34
......@@ -119,14 +119,12 @@ enum Reg31Mode {
class Instruction {
public:
Instr InstructionBits() const {
Instr bits;
memcpy(&bits, this, sizeof(bits));
return bits;
V8_INLINE Instr InstructionBits() const {
return *reinterpret_cast<const Instr*>(this);
}
void SetInstructionBits(Instr new_instr) {
memcpy(this, &new_instr, sizeof(new_instr));
V8_INLINE void SetInstructionBits(Instr new_instr) {
*reinterpret_cast<Instr*>(this) = new_instr;
}
int Bit(int pos) const {
......@@ -369,28 +367,6 @@ class Instruction {
return reinterpret_cast<uint8_t*>(this) + offset;
}
uint32_t Literal32() {
uint32_t literal;
memcpy(&literal, LiteralAddress(), sizeof(literal));
return literal;
}
uint64_t Literal64() {
uint64_t literal;
memcpy(&literal, LiteralAddress(), sizeof(literal));
return literal;
}
float LiteralFP32() {
return rawbits_to_float(Literal32());
}
double LiteralFP64() {
return rawbits_to_double(Literal64());
}
Instruction* NextInstruction() {
return this + kInstructionSize;
}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment