Ryzen public erratas
Mike Tancsa
mike at sentex.net
Wed Jun 13 20:41:06 UTC 2018
On 6/13/2018 6:35 AM, Konstantin Belousov wrote:
> Today I noted that AMD published the public errata document for Ryzens,
> https://developer.amd.com/wp-content/resources/55449_1.12.pdf
>
> Some of the issues listed there looks quite relevant to the potential
> hangs that some people still experience with the machines. I wrote
> a script which should apply the recommended workarounds to the erratas
> that I find interesting.
>
> To run it, kldload cpuctl, then apply the latest firmware update to your
> CPU, then run the following shell script. Comments indicate the errata
> number for the workarounds.
Hi,
tl;dr: The Microcode changes seem to fix a hard lockup I was able to
reliable reproduce back in Feb.
The BIOS on my AMD is pretty up to date. I think it has the same
microcode as whats in the ports. x86info -a shows
root at ryzenbsd11:/home/mdtancsa # x86info -a | grep -i microc
Microcode patch level: 0x8001137
root at ryzenbsd11:/home/mdtancsa #
after running the microcode update and
root at ryzenbsd11:/home/mdtancsa # /usr/local/etc/rc.d/microcode_update
onestart
Updating CPU Microcode...
Done.
root at ryzenbsd11:/home/mdtancsa # x86info -a | grep -i microc
Microcode patch level: 0x8001137
root at ryzenbsd11:/home/mdtancsa #
However, the dmesg after the microcode update adds this line
AMD Extended Feature Extensions ID EBX=0x1007<CLZERO,IRPerf,XSaveErPtr>
CPU: AMD Ryzen 5 1600X Six-Core Processor (3593.36-MHz
K8-class CPU)
Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
AMD
Features2=0x35c233ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX>
Structured Extended
Features=0x209c01a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA>
XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
TSC: P-state invariant, performance statistics
I ran the script
root at ryzenbsd11:/home/mdtancsa # cat fix.sh
#!/bin/sh
# Enable workarounds for erratas listed in
# https://developer.amd.com/wp-content/resources/55449_1.12.pdf
# 1057, 1109
sysctl machdep.idle_mwait=0
sysctl machdep.idle=hlt
for x in /dev/cpuctl*; do
# 1021
cpucontrol -m '0xc0011029|=0x2000' $x
# 1033
cpucontrol -m '0xc0011020|=0x10' $x
# 1049
cpucontrol -m '0xc0011028|=0x10' $x
# 1095
cpucontrol -m '0xc0011020|=0x200000000000000' $x
echo $x
done
root at ryzenbsd11:/home/mdtancsa # sh ./fix.sh
machdep.idle_mwait: 1 -> 0
machdep.idle: acpi -> hlt
/dev/cpuctl0
/dev/cpuctl1
/dev/cpuctl10
/dev/cpuctl11
/dev/cpuctl2
/dev/cpuctl3
/dev/cpuctl4
/dev/cpuctl5
/dev/cpuctl6
/dev/cpuctl7
/dev/cpuctl8
/dev/cpuctl9
root at ryzenbsd11:/home/mdtancsa #
Using a FreeBSD stable from back in Feb, I was able to crash Ryzen and
Epyc based systems
(https://lists.freebsd.org/pipermail/freebsd-stable/2018-February/088439.html)
by generating a lot of traffic between the hypervisor and guests. The
same tests on an intel based box ran just fine.
e.g. start 3 guests in bhyve (amd64) and run combos of iperf3 between
them. It would not take too long, but the box would hard lock-- i.e.
blank screen, no crash dump etc.
With the latest micro code update, I have been running the same sort of
tests and so far so good. I will let them run overnight to see if things
are now stable on STABLE.
---Mike
>
> Please report the results. If the script helps, I will code the kernel
> change to apply the workarounds.
>
> #!/bin/sh
>
> # Enable workarounds for erratas listed in
> # https://developer.amd.com/wp-content/resources/55449_1.12.pdf
>
> # 1057, 1109
> sysctl machdep.idle_mwait=0
> sysctl machdep.idle=hlt
>
> for x in /dev/cpuctl*; do
> # 1021
> cpucontrol -m '0xc0011029|=0x2000' $x
> # 1033
> cpucontrol -m '0xc0011020|=0x10' $x
> # 1049
> cpucontrol -m '0xc0011028|=0x10' $x
> # 1095
> cpucontrol -m '0xc0011020|=0x200000000000000' $x
> done
>
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
>
>
--
-------------------
Mike Tancsa, tel +1 519 651 3400 x203
Sentex Communications, mike at sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada
More information about the freebsd-current
mailing list