Configuring Hugepages For Oracle on Linux

NOTE: I have recently discovered that Oracle, hugepages, and NUMA are incompatible, at least on Linux. NUMA must be disabled to use hugepages with Oracle.

RAM is managed in 4k pages in 64 bit Linux. When memory sizes were limited, and systems with more than 16G RAM were rare, this was not a problem. However, as systems get more memory, the number of memory pages increased and become less manageable. Hugepages make managing the large amounts of memory available in modern servers much less CPU intensive. In particular, with the number of memory pages reduced by typically three orders of magnitude, the chance that a particular page pointer will be available in the processor cache goes up dramatically.

First some caveats on using hugepages: Hugepages are not swappable, thus Oracle SGA memory must either be all hugepages are no hugepages. If you allocate hugepages for Oracle, and don’t allocate enough for the entire SGA, Oracle will not use any hugepage memory. If there is not enough non-hugepage memory, your database will not start. Finally, enabling hugepages will require a server restart, so if you do not have the ability to restart your server, do not attempt to enable hugepages.

Oracle Metalink note 1134002.1 says explicitly that AMM (MEMORY_TARGET/MEMORY_MAX_TARGET) is incompatible with hugepages. However, I have found at least one blog that says that AMM is compatible with hugepages when using the USE_LARGE_PAGES parameter in 11g (where AMM is available). Until further confirmation is found, I do not recommend trying to combine hugepages with MEMORY_TARGET/MEMORY_MAX_TARGET.

There are both Oracle database settings and Linux OS settings that must be adjusted in order to enable hugepages. The Linux and oracle settings of concern are below:

Linux OS settings:

/etc/sysctl.conf:

vm.nr_hugepages

kernel.shmmax

kernel.shmall

/etc/security/limits.conf:

oracle soft memlock

oracle hard memlock

Oracle Database spfile/init.ora:

SGA_TARGET

SGA_MAX_SIZE

MEMORY_TARGET

MEMORY_MAX_TARGET

USE_LARGE_PAGES

First, calculate the Linux OS settings. Kernel.shmmax should be set to the size of the largest SGA_TARGET on the server plus 1G, to account for other processes. For a single instance with 180G RAM, that would be 181G.

Kernel.shmall should be set to the sum of the SGA_TARGET values divided by the pagesize. Use ‘getconf pagesize’ command to get the page size. Units are bytes. The standard pagesize on Linux x86_64 is 4096, or 4k.

Oracle soft memlock and oracle hard memlock should be set to slightly less than the total memory on the server, I chose 230G. Units are kbytes, so the number is 230000000. This is the total amount of memory Oracle is allowed to lock.

Now for the hugepage setting itself: vm.nr_hugepages is the total number of hugepages to be allocated on the system. The number of hugepages required can be determined by finding the maximum amount of SGA memory expected to be used by the system (the SGA_MAX_SIZE value normally, or the sum of them on a server with multiple instances) and dividing it by the size of the hugepages, 2048k, or 2M on Linux. To account for Oracle process overhead, add five more hugepages . So, if we want to allow 180G of hugepages, we would use this equation: (180*1024*1024/2048)+5. This gives us 92165 hugepages for 180G. Note: I took a shortcut in this calculation, by using memory in MEG rather than the full page size. To calculate the number in the way I initial described, the equation would be: (180*1024*1024*1024)/(2048*1024).

In order to allow the Oracle database to use up to 180G for the SGA_TARGET/SGA_MAX_SIZE, below are the settings we would use for the OS:

/etc/security/limits.conf

oracle soft memlock 230000000

oracle hard memlock 230000000

/etc/sysctl.conf

vm.nr_hugepages = 92165

kernel.shmmax = 193273528320+1g = 194347270144

kernel.shmall = 47448064

In the Oracle database there is a new setting in 11gR2. This is USE_LARGE_PAGES, with possible values of ‘true’, ‘only’, and ‘false’. True is the default and current behavior, ‘False’ means never use hugepages, use only small pages. ‘Only’ forces the database to use hugepages. If insufficient pages are available the instance will not start. Regardless of this setting, it must use either all hugepages or all smallpages. According to some blogs, using this setting is what allows the MEMORY_MAX_TARGET and MEMORY_TARGET to be used with hugepages. As I noted above, I have not verified this with a Metalink note as yet.

Next, set SGA_TARGET and SGA_MAX_SIZE to the desired size. I generally recommend setting both to the same size. Oracle recommends explicitly setting the MEMORY_TARGET and MEMORY_MAX_TARGET to 0 when enabling hugepages. So these are the values in the spfile that we change:

USE_LARGE_PAGES=only

SGA_TARGET=180G

SGA_MAX_SIZE=180G

MEMORY_MAX_TARGET=0

MEMORY_TARGET=0

In order to verify that hugepages are being used, run this command:

‘cat /proc/meminfo | grep Huge’.

It will show HugePages_Total, HugePages_Free, and HugePages_Rsvd. The HugePages_Rsvd value is the number of hugepages that are in use..

Note that this example uses Linux hugepage size of 2M (2048k). On Itanium systems the hugepage size is 256M.

These instructions should allow you successfully implement huge pages in Linux. Note that everything would be the same for Oracle 10gR2, with the exception that the USE_LARGE_PAGES parameter is unavailable.

This entry was posted on March 14, 2012 at 21:13 and is filed under Oracle hints. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

30 Responses to “Configuring Hugepages For Oracle on Linux”

Marko Sutic (@MarkoSutic) Says:
March 15, 2012 at 03:31 | Reply
Great article about how to configure Hugepages for Oracle on Linux. Maybe the best I’ve seen.

Clearly explained with only important information provided.

Regards,
Marko
Rodrigo Mufalani Says:
March 15, 2012 at 08:58 | Reply
Great post… I got performance issue and a Advice directly from alert.

****************** Large Pages Information *****************

Total Shared Global Region in Large Pages = 0 KB (0%)

Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB) (alloc incr 64 MB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB

RECOMMENDATION:
Total Shared Global Region size is 20 GB. For optimal performance,prior to the next instance restart increase the number
of unused Large Pages by atleast 10241 2048 KB Large Pages (20 GB) system wide to get 100% of the Shared Global Region allocated with Large pages

This post helped configure properly my database and O.S to use hugepages.

Thanx a lot
- dbakerber Says:
  March 15, 2012 at 09:23 | Reply
  Thanks for the input, I knew that but didn’t think to put it in my blog. Just remember that that will be the information for the single instance. If you have multiple instances you will have to add up the sga sizes to calculate the size of your hugepage pool.
Jared Says:
April 13, 2012 at 13:22 | Reply
I think as of 11.2.0.3, SGA will allocate from std memory when hugepages are exhausted. Sorry, can’t remember the details, but should not be too hard to locate.
- dbakerber Says:
  April 13, 2012 at 13:31 | Reply
  Jared-
  This is only partially correct. If there are sufficient hugepages available, it will allocate hugepages. If there are not sufficient hugepages available it will attempt to allocate std memory. But it must be either all hugepages or all standard, it cannot allocate a combination of the two.
  - Gilles Says:
    April 22, 2012 at 04:38
    Andrew,
    read again MOS Note1392497.1 about use_large_pages parameter : there seems to have some changes with 11.2.0.3 patchset confirming what Jared wrote above.
    Regards
  - dbakerber Says:
    April 23, 2012 at 09:14
    Jared and GIles – you are correct, Giles thanks for pointing that note out to me. It looks like in 11.2.0.3 oracle will use a combination of hugepages and standard pages. This strikes me as a bad idea, but it will happen. Jared, my apologies.
Patricia Says:
April 16, 2012 at 18:29 | Reply
after setting vm.nr_hugepages on my linux server, I’ve had to add another database. Value of vm.nr_hugepages was not high enough to cover the new database. How high can vm.nr_hugepages be set to ? I know vm.nr_hugepages can not max out ram. I’d like to leave some growth space in this value so I dont have to bug my SA to adjust up and reboot.
- dbakerber Says:
  April 17, 2012 at 11:09 | Reply
  vm.nr_hugepages can be set to as much memory as available on the system. However, my general rule of thumb is that on Linux, oracle can use up to 80% of the total system memory, so take 80% of the total memory, subtract the amount of memory you want for the PGA_AGGREGATE_TARGET, and that tells you how much memory you can give to hugepages. One caveat is that I have not experimented with determining if there is an upper limit to how much memory the Linux OS needs. So on VLM systems, it is entirely possible the required OS memory number may be some fixed amount as opposed to 20% of the total system memory. eg, on 256G RAM system, its very possible that Linux needs a flat 25G instead of the 50G or so that my 80% rule of thumb suggests. Do any Linux experts have input?
Gilles Says:
April 22, 2012 at 04:51 | Reply
Andrew,
you wrote : “I have recently discovered that Oracle, hugepages, and NUMA are incompatible, at least on Linux. NUMA must be disabled to use hugepages with Oracle.”

I found nothing on MOS concerning this incompatiblity.
Have you some references ?
An you disable NUMA at which level ? Linux kernel or Oracle instance parameters ?
—
Regards
- dbakerber Says:
  April 23, 2012 at 09:16 | Reply
  Giles- There is no metalink note on this that I have found. However, when a friend ran into issues with getting hugepages to work, he opened an SR and the first thing Oracle told him to do was to disable NUMA at the OS level. It can also be disabled in the DB, but Oracle didnt mention that.
Daniel Nagel Says:
May 18, 2012 at 12:02 | Reply
I have a system with:

OS -> RHEL 4
DB -> Oracle 10gR2
RAM -> 12GB

cat /proc/cpuinfo | egrep “processor|physical\ id|core\ id|cpu\ cores”
processor : 0
physical id : 1
core id : 16
processor : 1
physical id : 0
core id : 0
processor : 2
physical id : 1
core id : 17
processor : 3
physical id : 0
core id : 1
processor : 4
physical id : 1
core id : 25
processor : 5
physical id : 0
core id : 9
processor : 6
physical id : 1
core id : 26
processor : 7
physical id : 0
core id : 10
processor : 8
physical id : 1
core id : 16
processor : 9
physical id : 0
core id : 0
processor : 10
physical id : 1
core id : 17
processor : 11
physical id : 0
core id : 1
processor : 12
physical id : 1
core id : 25
processor : 13
physical id : 0
core id : 9
processor : 14
physical id : 1
core id : 26
processor : 15
physical id : 0
core id : 10

cat /etc/sysctl.conf
…
kernel.shmmax = 8589934592
kernel.shmall = 3145728
vm.nr_hugepages = 3072
…

cat /etc/security/limits.conf
…
oracle soft memlock 4194304
oracle hard memlock 4194304
…

init.ora
…
*.sessions=500
…
*.pga_aggregate_target=1342177280
*.sga_max_size=8589934592
*.sga_target=6442450944
…

cat /proc/meminfo | grep Huge
HugePages_Total: 3072
HugePages_Free: 3072
Hugepagesize: 2048 kB

What more I should configure to use Huges Pages?
dbakerber Says:
May 20, 2012 at 01:38 | Reply
I would check your math, compare your hugepages to your memlock settings.
jcnars Says:
August 30, 2012 at 16:23 | Reply
Thanks, great article.
Just thought the following point didn’t clearly come out in the article.
There’s no need to increase the shmall and shmmax, ***if***:
(1) the instances are already configured and running in the server
***AND***
(2) the SGA for any of the DBs are not increased (meaning, one is just configuring those instances to use hugepages instead of the default pagesize)
Mathijs Bruggink Says:
February 21, 2013 at 01:10 | Reply
This is awesome article. We are considering huge pages wwith sga_max and sga_target, and would kick memory_target and _max_size in favour of thisthen on so called shared clusters, where various projects will add their dbs. Only concern i now have is that it will b mix of smal-medium and large. And since they will come from new projects no clue yet what largest sga_target will b. But that will b challenge then.
Ty for sharing this post.
Mxx Says:
April 23, 2013 at 14:18 | Reply
Hmm, @dbakerber, your formula seems to contradict with “Oracle Database 11g Release 2 on Red Hat Enterprise Linux 6 Deployment Recommendations” document, pages 13 and 14.
Either you or that document seems to be confusing ‘page size’ and ‘hugepage size’…Or I’m confusing reading both of these things.
- dbakerber Says:
  April 23, 2013 at 14:34 | Reply
  What formula are you talking about?
  - Mxx Says:
    April 23, 2013 at 14:43
    quote from that doc:
    “The maximum size of a shared segment (shmmax) should be one-half the size of total memory. So for our 96 GB example, this parameter should be set to 96*1024^3/2 or 51,539,607,552; for 16 GB, shmmax, is proportionally smaller: 16*1024^3/2 = 8,589,934,592.”
  - Mxx Says:
    April 23, 2013 at 14:46
    This document is located at this ugly url http://www.redhat.com/rhecm/rest-rhecm/jcr/repository/collaboration/jcr:system/jcr:versionStorage/ee6fe0000a0526020f35498ae39e9939/12/jcr:frozenNode/rh:resourceFile
  - dbakerber Says:
    April 23, 2013 at 15:36
    It is my understanding that shmmax sets the max size of a single shared memory segment, so the actual value is not critical as long as you dont try and allocate more memory than is on the server. I dont think this document was available when I set this value, and as I recall I used Oracle documentation to determine the shmmax value. So, I expect there is more than one way to calculate this value.
jg167 Says:
May 18, 2013 at 01:28 | Reply
The only numa option for LInux of which I am aware are for the kernel cmd line and there numa=off directs a completely numa Unaware policy for allocating memory to processes, and scheduling processes irrespective of the socket on which the related entities reside. In other words maximum numa-ness. Setting numa=on directs the kernel (if compiled with these features) to be more numa-aware and thus reduce cross socket traffic. Its a bit hard to see how numa=on would adversely impact using huge pages let alone cause Oracle to malfunction or not start or whatever the issue was.

So I’d have to get more info before accepting that numa being enabled was really the problem.
- dbakerber Says:
  May 19, 2013 at 07:18 | Reply
  My understanding of NUMA extends not much further than the translation of the acronym, however as noted earlier Oracle did ask that NUMA be turned off as one of the first steps in debugging problems with hugepages. Later, I also ran into problems and discovered that disabling NUMA fixed them. If, as you say, it doesnt actually disable NUMA but changes the way NUMA is handled, it could be that the way Oracle on Linux handles the setting is the problem.
  - GP Says:
    November 22, 2013 at 07:39
    Anyway, at least in 11.2.0.3, numa is disabled by default at the Oracle level as “_enable_NUMA_support” is set to FALSE.
after Says:
September 5, 2013 at 10:25 | Reply
You saved my day!!!!
sc Says:
May 1, 2014 at 11:16 | Reply
I noticed your setting in the limits.conf as
oracle soft memlock 230000000
oracle hard memlock 230000000

instead of based on Oracle support’s doc on Huge Page
* soft ..
* hard ..

Which one is correct or does it really matter?
- dbakerber Says:
  May 1, 2014 at 11:40 | Reply
  It doesnt really matter. Its just a limit. You might want to have a limit, you might not.
- Mxx Says:
  May 1, 2014 at 12:38 | Reply
  “* soft|hard …” means all users
  “oracle soft|hard …” means these limits apply only to the user “oracle”.
  - dbakerber Says:
    May 1, 2014 at 12:56
    Oh, sorry. I am not familiar with that particular syntax. But I think the Oracle doc is wrong if that is what it says. The owner of the Oracle database should be the only significant user of resources on the server. You pay Oracle licenses based on server horsepower, if another user is taking significant resources you end up paying Oracle for licensing you cannot use.
  - Mxx Says:
    May 1, 2014 at 13:07
    Well, that’s totally up to you/dba/ops/server owner to decide how that server is going to be used.
    That info is not “wrong”.
    It assumes that since Oracle DB is the only significant process on that server, you might as well set it to *.
    Also newer version of ORacle DB might be using a different user than “oracle”..so setting it to * ensures that the whole system can use huge pages.
sc Says:
May 1, 2014 at 14:20 | Reply
That’s what I thought (* means default entry – all users), but setting the hugetlb_shm_group to oracle group GID in the proc/sys/vm/hugetlb_shm_group file will ensure huge page is only used by the oracle group.