Configuring Hugepages For Oracle on Linux

NOTE: I have recently discovered that Oracle, hugepages, and NUMA are incompatible, at least on Linux. NUMA must be disabled to use hugepages with Oracle.

RAM is managed in 4k pages in 64 bit Linux.  When memory sizes were limited, and systems with more than 16G RAM were rare, this was not a problem.  However, as systems get more memory, the number of memory pages increased and become less manageable.  Hugepages make managing the large amounts of memory available in modern servers much less CPU intensive.  In particular, with the number of memory pages reduced by typically three orders of magnitude, the chance that a particular page pointer will be available in the processor cache goes up dramatically. 

First some caveats on using hugepages:   Hugepages are not swappable, thus Oracle SGA memory must either be all hugepages are no hugepages.   If you allocate hugepages for Oracle, and don’t allocate enough for the entire SGA, Oracle will not use any hugepage memory.    If there is not enough non-hugepage memory, your database will not start.  Finally, enabling hugepages will require a server restart, so if you do not have the ability to restart your server, do not attempt to enable hugepages.

Oracle Metalink note 1134002.1 says explicitly that AMM (MEMORY_TARGET/MEMORY_MAX_TARGET) is incompatible with hugepages.  However, I have found at least one blog that says that AMM is compatible with hugepages when using the USE_LARGE_PAGES parameter in 11g (where AMM is available).  Until further confirmation is found, I do not recommend trying to combine hugepages with MEMORY_TARGET/MEMORY_MAX_TARGET. 

There are both Oracle database settings and Linux OS settings that must be adjusted in order to enable hugepages.  The Linux and oracle settings of concern are below:

Linux OS settings:

/etc/sysctl.conf:

vm.nr_hugepages

kernel.shmmax

kernel.shmall

 /etc/security/limits.conf:

 oracle soft memlock

oracle hard memlock

 

Oracle Database spfile/init.ora:

 SGA_TARGET

SGA_MAX_SIZE

MEMORY_TARGET

MEMORY_MAX_TARGET

USE_LARGE_PAGES

First, calculate the Linux OS settings.  Kernel.shmmax should be set to the size of the largest SGA_TARGET on the server plus 1G, to account for other processes.  For a single instance with 180G RAM, that would be 181G.

 Kernel.shmall should be set to the sum of the SGA_TARGET values divided by the pagesize.  Use ‘getconf pagesize’ command to get the page size.  Units are bytes.  The standard pagesize on Linux x86_64 is 4096, or 4k.

Oracle soft memlock and oracle hard memlock should be set to slightly less than the total memory on the server, I chose 230G.  Units are kbytes, so the number is 230000000.  This is the total amount of memory Oracle is allowed to lock.

Now for the hugepage setting itself: vm.nr_hugepages is the total number of hugepages to be allocated on the system.  The number of hugepages required can be determined by finding the maximum amount of SGA memory expected to be used by the system (the SGA_MAX_SIZE value normally, or the sum of them on a server with multiple instances) and dividing it by the size of the hugepages,  2048k, or 2M on Linux.  To account for Oracle process overhead, add five more hugepages .  So, if we want to allow 180G of hugepages, we would use this equation:  (180*1024*1024/2048)+5.  This gives us 92165 hugepages for 180G.  Note: I took a shortcut in this calculation, by using memory in MEG rather than the full page size.  To calculate the number in the way I initial described, the equation would be:  (180*1024*1024*1024)/(2048*1024).

In order to allow the Oracle database to use up to 180G for the SGA_TARGET/SGA_MAX_SIZE, below are the settings we would use for the OS:

 

/etc/security/limits.conf

 oracle soft memlock 230000000

oracle hard memlock 230000000

 

/etc/sysctl.conf

 

vm.nr_hugepages =  92165

kernel.shmmax  = 193273528320+1g = 194347270144

kernel.shmall  = 47448064

 

In the Oracle database there is a new setting in 11gR2.  This is USE_LARGE_PAGES, with possible values of ‘true’, ‘only’, and ‘false’.  True is the default and current behavior, ‘False’ means never use hugepages, use only small pages.  ‘Only’ forces the database to use hugepages.  If insufficient pages are available the instance will not start.  Regardless of this setting, it must use either all hugepages or all smallpages.  According to some blogs, using this setting is what allows the MEMORY_MAX_TARGET and MEMORY_TARGET to be used with hugepages.  As I noted above, I have not verified this with a Metalink note as yet.

Next, set SGA_TARGET and SGA_MAX_SIZE to the desired size.  I generally recommend setting both to the same size.  Oracle recommends explicitly setting the MEMORY_TARGET and MEMORY_MAX_TARGET to 0 when enabling hugepages.  So these are the values in the spfile that we change:

USE_LARGE_PAGES=only

SGA_TARGET=180G

SGA_MAX_SIZE=180G

MEMORY_MAX_TARGET=0

MEMORY_TARGET=0

 

In order to verify that hugepages are being used, run this command:

‘cat /proc/meminfo | grep Huge’.

 It will show HugePages_Total, HugePages_Free, and HugePages_Rsvd.  The HugePages_Rsvd value is the number of hugepages that are in use..

Note that this example uses Linux hugepage size of 2M (2048k).  On Itanium systems the hugepage size is 256M.

These instructions should allow you successfully implement huge pages in Linux.  Note that everything would be the same for Oracle 10gR2, with the exception that the USE_LARGE_PAGES parameter is unavailable.

30 Responses to “Configuring Hugepages For Oracle on Linux”

  1. Marko Sutic (@MarkoSutic) Says:

    Great article about how to configure Hugepages for Oracle on Linux. Maybe the best I’ve seen.

    Clearly explained with only important information provided.

    Regards,
    Marko

  2. Rodrigo Mufalani Says:

    Great post… I got performance issue and a Advice directly from alert.

    ****************** Large Pages Information *****************

    Total Shared Global Region in Large Pages = 0 KB (0%)

    Large Pages used by this instance: 0 (0 KB)
    Large Pages unused system wide = 0 (0 KB) (alloc incr 64 MB)
    Large Pages configured system wide = 0 (0 KB)
    Large Page size = 2048 KB

    RECOMMENDATION:
    Total Shared Global Region size is 20 GB. For optimal performance,prior to the next instance restart increase the number
    of unused Large Pages by atleast 10241 2048 KB Large Pages (20 GB) system wide to get 100% of the Shared Global Region allocated with Large pages

    This post helped configure properly my database and O.S to use hugepages.

    Thanx a lot

    • dbakerber Says:

      Thanks for the input, I knew that but didn’t think to put it in my blog. Just remember that that will be the information for the single instance. If you have multiple instances you will have to add up the sga sizes to calculate the size of your hugepage pool.

  3. Jared Says:

    I think as of 11.2.0.3, SGA will allocate from std memory when hugepages are exhausted. Sorry, can’t remember the details, but should not be too hard to locate.

    • dbakerber Says:

      Jared-
      This is only partially correct. If there are sufficient hugepages available, it will allocate hugepages. If there are not sufficient hugepages available it will attempt to allocate std memory. But it must be either all hugepages or all standard, it cannot allocate a combination of the two.

      • Gilles Says:

        Andrew,
        read again MOS Note1392497.1 about use_large_pages parameter : there seems to have some changes with 11.2.0.3 patchset confirming what Jared wrote above.
        Regards

      • dbakerber Says:

        Jared and GIles – you are correct, Giles thanks for pointing that note out to me. It looks like in 11.2.0.3 oracle will use a combination of hugepages and standard pages. This strikes me as a bad idea, but it will happen. Jared, my apologies.

  4. Patricia Says:

    after setting vm.nr_hugepages on my linux server, I’ve had to add another database. Value of vm.nr_hugepages was not high enough to cover the new database. How high can vm.nr_hugepages be set to ? I know vm.nr_hugepages can not max out ram. I’d like to leave some growth space in this value so I dont have to bug my SA to adjust up and reboot.

    • dbakerber Says:

      vm.nr_hugepages can be set to as much memory as available on the system. However, my general rule of thumb is that on Linux, oracle can use up to 80% of the total system memory, so take 80% of the total memory, subtract the amount of memory you want for the PGA_AGGREGATE_TARGET, and that tells you how much memory you can give to hugepages. One caveat is that I have not experimented with determining if there is an upper limit to how much memory the Linux OS needs. So on VLM systems, it is entirely possible the required OS memory number may be some fixed amount as opposed to 20% of the total system memory. eg, on 256G RAM system, its very possible that Linux needs a flat 25G instead of the 50G or so that my 80% rule of thumb suggests. Do any Linux experts have input?

  5. Gilles Says:

    Andrew,
    you wrote : “I have recently discovered that Oracle, hugepages, and NUMA are incompatible, at least on Linux. NUMA must be disabled to use hugepages with Oracle.”

    I found nothing on MOS concerning this incompatiblity.
    Have you some references ?
    An you disable NUMA at which level ? Linux kernel or Oracle instance parameters ?

    Regards

    • dbakerber Says:

      Giles- There is no metalink note on this that I have found. However, when a friend ran into issues with getting hugepages to work, he opened an SR and the first thing Oracle told him to do was to disable NUMA at the OS level. It can also be disabled in the DB, but Oracle didnt mention that.

  6. Daniel Nagel Says:

    I have a system with:

    OS -> RHEL 4
    DB -> Oracle 10gR2
    RAM -> 12GB

    cat /proc/cpuinfo | egrep “processor|physical\ id|core\ id|cpu\ cores”
    processor : 0
    physical id : 1
    core id : 16
    processor : 1
    physical id : 0
    core id : 0
    processor : 2
    physical id : 1
    core id : 17
    processor : 3
    physical id : 0
    core id : 1
    processor : 4
    physical id : 1
    core id : 25
    processor : 5
    physical id : 0
    core id : 9
    processor : 6
    physical id : 1
    core id : 26
    processor : 7
    physical id : 0
    core id : 10
    processor : 8
    physical id : 1
    core id : 16
    processor : 9
    physical id : 0
    core id : 0
    processor : 10
    physical id : 1
    core id : 17
    processor : 11
    physical id : 0
    core id : 1
    processor : 12
    physical id : 1
    core id : 25
    processor : 13
    physical id : 0
    core id : 9
    processor : 14
    physical id : 1
    core id : 26
    processor : 15
    physical id : 0
    core id : 10

    cat /etc/sysctl.conf

    kernel.shmmax = 8589934592
    kernel.shmall = 3145728
    vm.nr_hugepages = 3072

    cat /etc/security/limits.conf

    oracle soft memlock 4194304
    oracle hard memlock 4194304

    init.ora

    *.sessions=500

    *.pga_aggregate_target=1342177280
    *.sga_max_size=8589934592
    *.sga_target=6442450944

    cat /proc/meminfo | grep Huge
    HugePages_Total: 3072
    HugePages_Free: 3072
    Hugepagesize: 2048 kB

    What more I should configure to use Huges Pages?

  7. dbakerber Says:

    I would check your math, compare your hugepages to your memlock settings.

  8. jcnars Says:

    Thanks, great article.
    Just thought the following point didn’t clearly come out in the article.
    There’s no need to increase the shmall and shmmax, ***if***:
    (1) the instances are already configured and running in the server
    ***AND***
    (2) the SGA for any of the DBs are not increased (meaning, one is just configuring those instances to use hugepages instead of the default pagesize)

  9. Mathijs Bruggink Says:

    This is awesome article. We are considering huge pages wwith sga_max and sga_target, and would kick memory_target and _max_size in favour of thisthen on so called shared clusters, where various projects will add their dbs. Only concern i now have is that it will b mix of smal-medium and large. And since they will come from new projects no clue yet what largest sga_target will b. But that will b challenge then.
    Ty for sharing this post.

  10. Mxx Says:

    Hmm, @dbakerber, your formula seems to contradict with “Oracle Database 11g Release 2 on Red Hat Enterprise Linux 6 Deployment Recommendations” document, pages 13 and 14.
    Either you or that document seems to be confusing ‘page size’ and ‘hugepage size’…Or I’m confusing reading both of these things. :/

  11. jg167 Says:

    The only numa option for LInux of which I am aware are for the kernel cmd line and there numa=off directs a completely numa Unaware policy for allocating memory to processes, and scheduling processes irrespective of the socket on which the related entities reside. In other words maximum numa-ness. Setting numa=on directs the kernel (if compiled with these features) to be more numa-aware and thus reduce cross socket traffic. Its a bit hard to see how numa=on would adversely impact using huge pages let alone cause Oracle to malfunction or not start or whatever the issue was.

    So I’d have to get more info before accepting that numa being enabled was really the problem.

    • dbakerber Says:

      My understanding of NUMA extends not much further than the translation of the acronym, however as noted earlier Oracle did ask that NUMA be turned off as one of the first steps in debugging problems with hugepages. Later, I also ran into problems and discovered that disabling NUMA fixed them. If, as you say, it doesnt actually disable NUMA but changes the way NUMA is handled, it could be that the way Oracle on Linux handles the setting is the problem.

      • GP Says:

        Anyway, at least in 11.2.0.3, numa is disabled by default at the Oracle level as “_enable_NUMA_support” is set to FALSE.

  12. after Says:

    You saved my day!!!!

  13. sc Says:

    I noticed your setting in the limits.conf as
    oracle soft memlock 230000000
    oracle hard memlock 230000000

    instead of based on Oracle support’s doc on Huge Page
    * soft ..
    * hard ..

    Which one is correct or does it really matter?

    • dbakerber Says:

      It doesnt really matter. Its just a limit. You might want to have a limit, you might not.

    • Mxx Says:

      “* soft|hard …” means all users
      “oracle soft|hard …” means these limits apply only to the user “oracle”.

      • dbakerber Says:

        Oh, sorry. I am not familiar with that particular syntax. But I think the Oracle doc is wrong if that is what it says. The owner of the Oracle database should be the only significant user of resources on the server. You pay Oracle licenses based on server horsepower, if another user is taking significant resources you end up paying Oracle for licensing you cannot use.

      • Mxx Says:

        Well, that’s totally up to you/dba/ops/server owner to decide how that server is going to be used.
        That info is not “wrong”.
        It assumes that since Oracle DB is the only significant process on that server, you might as well set it to *.
        Also newer version of ORacle DB might be using a different user than “oracle”..so setting it to * ensures that the whole system can use huge pages.

  14. sc Says:

    That’s what I thought (* means default entry – all users), but setting the hugetlb_shm_group to oracle group GID in the proc/sys/vm/hugetlb_shm_group file will ensure huge page is only used by the oracle group.

Leave a reply to dbakerber Cancel reply