Read Slide 1 text version

Maximizing Six-Core AMD OpteronTM Processor Performance with RHEL

Bhavna Sarathy Red Hat Technical Lead, AMD Sanjay Rao Senior Software Engineer, Red Hat Sept 4, 2009

1 Red Hat Summit 2009 | Bhavna Sarathy

Agenda

· · Six-Core AMD OpteronTM processor codenamed "Istanbul" ­ overview Six-Core AMD OpteronTM processor feature support · Continued virtualization support · New Innovations · · · Red Hat Enterprise Linux software support Performance benchmarking results Conclusions

2

Red Hat Summit 2009 | Bhavna Sarathy

Six-Core AMD OpteronTM Processor ("Istanbul")

· Six True Cores · New HyperTransportTM Technology HT Assist · Increased HyperTransportTM 3.0 Technology (HT3) Bandwidth · Higher Performing Integrated Memory Controller · Same power/thermal envelopes as Quad-Core AMD OpteronTM Processor · Continued AMD VirtualizationTM (AMD-VTM) technology support, Rapid Virtualization Indexing

3

Red Hat Summit 2009 | Bhavna Sarathy

Prior Generation Innovations that Continue

All the performance-enhancing features of Quad-Core AMD OpteronTM processor

AMD Wide Floating-Point Accelerator AMD Memory Optimizer Technology Dual Dynamic Power ManagementTM

AMD Balanced Smart Cache

HyperTransportTM 3 Technology

AMD VirtualizationTM (AMD-VTM) technology

VM1 VM2

Virtual Memory 1 Virtual Memory 2

CPU

HT3

CPU

Physical Memory

4

Red Hat Summit 2009 | Bhavna Sarathy

Prior Generation Innovations that Continue

All the power-efficiency features of Quad-Core AMD OpteronTM processor

Independent Dynamic Core Technology AMD CoolCoreTM Technology Dual Dynamic Power ManagementTM Low-Power DDR2 Memory

AMD PowerCap manager

Core Select

AMD Smart Fetch technology

5

Red Hat Summit 2009 | Bhavna Sarathy

New Innovations in Six-Core AMD OpteronTM processor ("Istanbul")

· Six cores per socket · Six core support for F (1207) socket infrastructure · Improves performance (compared to Quad-Core AMD OpteronTM processor) · HT Assist ­ in multi-socket systems: · Reduces probe traffic · Resolves probes more quickly · Higher HyperTransportTM 3.0 Technology Speeds · Support for up to 4.8GT/s per link · Overall system performance

6 Red Hat Summit 2009 | Bhavna Sarathy

HT Assist : What is it?

· Micro-architectural feature in Six-Core AMD OpteronTM processor · Helps reduce memory latency · Helps increase overall system performance in 4socket and 8-socket systems · Improves HyperTransportTM technology link efficiency and increases performance by: · Reducing probe traffic · Resolving probes more quickly · Probe "broadcasting" can be eliminated in 8 of 11 typical CPU-to-CPU transactions

7 Red Hat Summit 2009 | Bhavna Sarathy

HT Assist : How does it work?

Query Example:

CPU 1 L3 CPU 2 L3 CPU 1 L3 CPU 2 L3

CPU 3 L3

CPU 4 L3 Without HT Assist (Total 10 transactions)

CPU 3 L3

CPU 4 L3

With HT Assist (Total 2 transactions) = Probe Request

= Probe Response

= Data Request

=L3 Directory

=Directory Read

= Data Response

8

Red Hat Summit 2009 | Bhavna Sarathy

HT Assist : What is the cache directory?

· The HT Assist is a sparse directory cache · Associated with memory controller of home node · Tracks all lines cached in the system from home node · Logically part of the memory controller · Physically in L3 cache, occupying 1MB of L3 cache · For many transactions, eliminates probe broadcasts · Host CPU knows exactly which CPU to probe for data · local accesses get local DRAM latency, · less queuing delay due to lower HT traffic overhead · Results in reduced latency and increased system performance in multi-socket systems

9 Red Hat Summit 2009 | Bhavna Sarathy

HT Assist: What is the result?

· Helps reduce memory latency · Helps increase overall system performance · 4-way stream memory bandwidth performance improves by ~60% (42 GB/s with HT Assist, and 25.5 GB/s without HT Assist)* · Can result in faster query times that can increase performance for cache sensitive applications: ·Database ·Virtualization ·HPC T Assist vs. 25.5GB/s without HT *

Assist)

*See backup slides for performance and configuration information.

10

Red Hat Summit 2009 | Bhavna Sarathy

HyperTransportTM 3.0 Technology

· Advantages of HyperTransportTM 3.0 technology (HT3) ·Compared to HyperTransportTM 1.0 technology (HT1), improves system bandwidth between CPUs and I/O ·Increased interconnect rate (from 2GT/s with HT1 up to 4.8GT/s per link with HT3) ·Improves overall system balance and scalability, especially in commercial applications (database, web server, etc.)

T Assist vs. 25.5GB/s without HT Assist)*

11

Red Hat Summit 2009 | Bhavna Sarathy

Six-Core AMD OpteronTM Processor Support For Red Hat Enterprise Linux®

· Excellent relationship with Red Hat · Hardware enablement · Virtualization and performance collaboration · · · ·

12

Six-Core AMD OpteronTM processor works best with RHEL5.4 Continued support for AMD-VTM with Rapid Virtualization Index Continued support for AMD Power Now!TM technology driver Continued support for Xen 2MB super pages

Red Hat Summit 2009 | Bhavna Sarathy

Six-Core AMD OpteronTM Support For Red Hat Enterprise Linux®

· RHEL5.4: New Features and support · Supports Six-Core AMD OpteronTM processors · AMD-Vi on SR5690 enabled systems · KVM virtualization support · RHEL6.0: New Features and support · 1GB huge page table

13

Red Hat Summit 2009 | Bhavna Sarathy

RHEL5.4 Performance Testing on Six-Core AMD OpteronTM "Istanbul"

· · · · · Bare Metal Scalability Testing with Oracle OLTP workload Multiple instance testing with OLTP workload Taking advantage of NUMA KVM multiguest testing with Oracle OLTP workload KVM multiguest testing with Sybase OLTP workload

14

Red Hat Summit 2009 | Bhavna Sarathy

RHEL5.4 Testing on Six-Core AMD OpteronTM "Istanbul"

· Bare Metal Scalability testing with Oracle OLTP workload · Multiple Instance testing with Oracle OLTP workload · Taking advantage of NUMA · KVM Multiguest testing with Oracle OLTP workload · KVM Multiguest testing with Sybase OLTP workload

15

Red Hat Summit 2009 | Bhavna Sarathy

Hardware Configuration

System

4 Socket - Six-Core AMD Opteron(tm) Processor 8431 @ 2400.099 MHz 64 GB Memory

Storage

HP ­ HSV300 Fusion IO SSD Device

16

Red Hat Summit 2009 | Bhavna Sarathy

Scaling with Oracle OTLP workload

450000.00 400000.00

350000.00

300000.00

Trans / Min

250000.00

200000.00

RHEL54 ­ FC RHEL54 ­ SSD

150000.00

100000.00

50000.00

0.00 10U 20U 40U 60U 80U 100U

Number of Users

Graph shows scaling with Oracle OLTP workload (Running in batch commit mode) Scaling improves with storage with low latency higher throughput characteristics

17 Red Hat Summit 2009 | Bhavna Sarathy

KVM ­ 2 Vcpu Multi guest - Oracle OLTP

24 cpu Istanbul - 64G

250000.00

742.56

800

700 200000.00 600

500 150000.00

Trans / min

398 .53

400

100000.00

300

216.15

200 50000.00

100

100

0.00 1 Guest ­ 2Vcpu 2 Guests ­ 4 Vcpu 4 Guests ­ 8 Vcpu 8 Guests ­ 16 Vcpu

0

No. of Guests - Total Vcpu

Scaling with multiple 2 Vcpu guests running Oracle OLTP workload ­ Near linear Scaling

18 Red Hat Summit 2009 | Bhavna Sarathy

KVM ­ 4 Vcpu Multi guest - Oracle OLTP

24 cpu Istanbul - 64G

300000.00 500.00 450.00 250000.00 400.00

350.00 200000.00 300.00

Trans / min

150000.00

250.00

200.00 100000.00 150.00

100.00 50000.00 50.00

0.00 1Guest­4 Vcpu-8G 2Guest­8 Vcpu-16G 4Guest­16 Vcpu-32G 6Guest-24 Vcpu-48G 8G-32Vcpu-64G-Oversub

0.00

No of Guest s - Total Vcpu - Tot al Memory

4 vcpu multi guest testing with Oracle OLTP workload shows good scaling Last bar shows no significant penalty with oversubscription of cpus

19 Red Hat Summit 2009 | Bhavna Sarathy

KVM ­ 8 Vcpu Multi guest - Oracle OLTP

Istanbul - 24 cpus - 64G

300000.00 300

277.8 5 271.99

280

250000.00 260

240 200000.00

Trans / min

220

206.02

150000.00

200

180 100000.00 160

140 50000.00 120

100

0.00

100 2Guest-16 Vcpu-28G 3Guest-24Vcpu-42G 4Guest-32Vcpu­56G

1Guest-8 Vcpu-14G

No of Guests - Total Vcpu-Total Memory

8 vcpu multi guest testing with Oracle OLTP workload shows linear scaling Last bar shows no significant penalty with oversubscription of cpus

20 Red Hat Summit 2009 | Bhavna Sarathy

NUMA ­ pinning with numactl

Istanbul - 24 CPUs - 64G Mem

40 0000.0 0 45 0

3500 00.00

389.1

40 0

350 3000 00.00

326.68

291.2

295.76

300

2500 00.00 250 2000 00.00

197.56 196.66

Trans / Min

200

15 0000.0 0 15 0

10 0000.0 0

100

100

10 0

5000 0.00

50

0.0 0

0 2Guest-12 vcp u-28G 4G uest-24vcpu-56G 2G uest-12 vcp u-28G -NUMA 4G uest-24vcpu-56G -NUMA 1Guest-6vcpu -14G 3Guest-18 vcp u-42G 1G uest-6vcpu -14G-NUMA 3G uest-18 vcp u-42G-NUMA

No of Guest - Total vCPU - Total Mem

Platform shows good scaling without NUMA tuning (Bars 1-4) Using numactl, linear scaling is achieved with multiple guests (Bars 5 -8)

21 Red Hat Summit 2009 | Bhavna Sarathy

NUMA ­ Pinning with taskset

RHEL5.4 Multi - Instance Scaling

Oracle database workload

6000 00.00

5000 00.00

40 0000.0 0

No of Trans

3000 00.00

2000 00.00

10 0000.0 0

0.0 0 1 Instance 2 Insta nces 4 Instances 4 Instances NUMA

No of Instances

22

The platform supports NUMA. By pinning 4 database instances into 4 NUMA nodes The platform supports NUMA. By pinning 4 database instances into 4 NUMA nodes a 10% performance improvement was seen (Compare bar 3 & 4) a 10% performance improvement was seen

Red Hat Summit 2009 | Bhavna Sarathy

KVM ­ 4 Vcpu Multi guest - Sybase OLTP

Istanbul - 24 cpu - 64G

16 0000.0 0 140000.00

12 0000.0 0

10 0000.0 0

T ra ns / min

8000 0.00

6000 0.00

40 000.00

2000 0.00

0.0 0 1G uest­4 Vcpu-8G 2 G uest­8 Vcpu-16G 4 G uests­16 Vcpu-32G 6 G uests­24 Vcp u-48G

Guests - Tota l Vcpu - T ota l M emory 4 vcpu guests showed scaling trend as more guests were added. Scaling was not linear as the workload was not tuned to run in KVM guest

23 Red Hat Summit 2009 | Bhavna Sarathy

KVM ­ 8 Vcpu Multi guest - Sybase OLTP Istanbul 24 cpu - 64G

140000.00 12 0000.0 0

10 0000.0 0

T rans / Min

8000 0.00

6000 0.00

40 000.00

2000 0.00

0.0 0 1 G uest­8Vcpu-14G 2 G uests­16 Vcp u-28G 3 G uests­24 Vcp u-42G

Guests - T otal Vcpu-T otal Memory

8 vcpu guests showed scaling trend as more guests were added. Scaling was not linear as the workload was not tuned to run in KVM guest

24 Red Hat Summit 2009 | Bhavna Sarathy

Conclusion

Six-core AMD OpteronTM Processor "Istanbul" has shown: · Good Vertical scaling

· Storage (low latency) · Memory (Dense memory)

· Good Horizontal scaling · Consolidation · Virtualization

· Storage (low latency , high bandwidth) · Memory (Dense memory) · NUMA

25 Red Hat Summit 2009 | Bhavna Sarathy

Conclusion (contd)

Six-core AMD OpteronTM Processor "Istanbul": · Retains the prior generation innovations · Adds new innovations

· six-core, HTAssist, higher HyperTransport 3.0 bandwidth

· Optimized on RHEL, new hardware features enabled · System consolidation in data centers · What is your data center story?

26

Red Hat Summit 2009 | Bhavna Sarathy

Questions?

Bhavna Sarathy [email protected] Sanjay Rao [email protected]

27

Red Hat Summit 2009 | Bhavna Sarathy

Backup

28

Red Hat Summit 2009 | Bhavna Sarathy

Four-Socket STREAM Performance Improvement with HT Assist (slide 10)

42GB/s using 4 x Six-Core AMD OpteronTM processors ("Istanbul") Model 8435 in Tyan Thunder n4250QE (S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP1 64-bit (with HT Assist enabled) 25.5GB/s using 4 x Six-Core AMD OpteronTM processors ("Istanbul") Model 8435 in Tyan Thunder n4250QE (S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP1 64-bit (with HT Assist disabled) 24GB/s using 4 x Quad-Core AMD OpteronTM processors ("Shanghai") Model 8384 in Tyan Thunder n4250QE (S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP1 64-bit 9GB/s using 4 x Hex-Core Intel Xeon processors ("Dunnington") Model E7450 in Supermicro X7QC3+ motherboard, 32GB (16x2GB DDR2-667 FB-DIMM) memory, SuSE Linux® Enterprise Server 10 SP1 64-bit

29

Red Hat Summit 2009 | Bhavna Sarathy

Disclaimer & Attribution

DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2009 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD CoolCore, AMD Opteron, AMD PowerNow!, AMD Virtualization, AMD-V, Dual Dynamic Power Management, and combinations thereof are trademarks of Advanced Micro Devices, Inc. HyperTransport is a licensed trademark of the HyperTransport Technology Consortium. Microsoft, Windows, and Windows Vista are registered trademarks of Microsoft Corporation in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

30

Red Hat Summit 2009 | Bhavna Sarathy

Information

Slide 1

30 pages

Report File (DMCA)

Our content is added by our users. We aim to remove reported files within 1 working day. Please use this link to notify us:

Report this file as copyright or inappropriate

1096373