NOTE: I have recently discovered that Oracle, hugepages, and NUMA are incompatible, at least on Linux. NUMA must be disabled to use hugepages with Oracle.
RAM is managed in 4k pages in 64 bit Linux. When memory sizes were limited, and systems with more than 16G RAM were rare, this was not a problem. However, as systems get more memory, the number of memory pages increased and become less manageable. Hugepages make managing the large amounts of memory available in modern servers much less CPU intensive. In particular, with the number of memory pages reduced by typically three orders of magnitude, the chance that a particular page pointer will be available in the processor cache goes up dramatically.
First some caveats on using hugepages: Hugepages are not swappable, thus Oracle SGA memory must either be all hugepages are no hugepages. If you allocate hugepages for Oracle, and don’t allocate enough for the entire SGA, Oracle will not use any hugepage memory. If there is not enough non-hugepage memory, your database will not start. Finally, enabling hugepages will require a server restart, so if you do not have the ability to restart your server, do not attempt to enable hugepages.
Oracle Metalink note 1134002.1 says explicitly that AMM (MEMORY_TARGET/MEMORY_MAX_TARGET) is incompatible with hugepages. However, I have found at least one blog that says that AMM is compatible with hugepages when using the USE_LARGE_PAGES parameter in 11g (where AMM is available). Until further confirmation is found, I do not recommend trying to combine hugepages with MEMORY_TARGET/MEMORY_MAX_TARGET.
There are both Oracle database settings and Linux OS settings that must be adjusted in order to enable hugepages. The Linux and oracle settings of concern are below:
Linux OS settings:
/etc/sysctl.conf:
vm.nr_hugepages
kernel.shmmax
kernel.shmall
/etc/security/limits.conf:
oracle soft memlock
oracle hard memlock
Oracle Database spfile/init.ora:
SGA_TARGET
SGA_MAX_SIZE
MEMORY_TARGET
MEMORY_MAX_TARGET
USE_LARGE_PAGES
First, calculate the Linux OS settings. Kernel.shmmax should be set to the size of the largest SGA_TARGET on the server plus 1G, to account for other processes. For a single instance with 180G RAM, that would be 181G.
Kernel.shmall should be set to the sum of the SGA_TARGET values divided by the pagesize. Use ‘getconf pagesize’ command to get the page size. Units are bytes. The standard pagesize on Linux x86_64 is 4096, or 4k.
Oracle soft memlock and oracle hard memlock should be set to slightly less than the total memory on the server, I chose 230G. Units are kbytes, so the number is 230000000. This is the total amount of memory Oracle is allowed to lock.
Now for the hugepage setting itself: vm.nr_hugepages is the total number of hugepages to be allocated on the system. The number of hugepages required can be determined by finding the maximum amount of SGA memory expected to be used by the system (the SGA_MAX_SIZE value normally, or the sum of them on a server with multiple instances) and dividing it by the size of the hugepages, 2048k, or 2M on Linux. To account for Oracle process overhead, add five more hugepages . So, if we want to allow 180G of hugepages, we would use this equation: (180*1024*1024/2048)+5. This gives us 92165 hugepages for 180G. Note: I took a shortcut in this calculation, by using memory in MEG rather than the full page size. To calculate the number in the way I initial described, the equation would be: (180*1024*1024*1024)/(2048*1024).
In order to allow the Oracle database to use up to 180G for the SGA_TARGET/SGA_MAX_SIZE, below are the settings we would use for the OS:
/etc/security/limits.conf
oracle soft memlock 230000000
oracle hard memlock 230000000
/etc/sysctl.conf
vm.nr_hugepages = 92165
kernel.shmmax = 193273528320+1g = 194347270144
kernel.shmall = 47448064
In the Oracle database there is a new setting in 11gR2. This is USE_LARGE_PAGES, with possible values of ‘true’, ‘only’, and ‘false’. True is the default and current behavior, ‘False’ means never use hugepages, use only small pages. ‘Only’ forces the database to use hugepages. If insufficient pages are available the instance will not start. Regardless of this setting, it must use either all hugepages or all smallpages. According to some blogs, using this setting is what allows the MEMORY_MAX_TARGET and MEMORY_TARGET to be used with hugepages. As I noted above, I have not verified this with a Metalink note as yet.
Next, set SGA_TARGET and SGA_MAX_SIZE to the desired size. I generally recommend setting both to the same size. Oracle recommends explicitly setting the MEMORY_TARGET and MEMORY_MAX_TARGET to 0 when enabling hugepages. So these are the values in the spfile that we change:
USE_LARGE_PAGES=only
SGA_TARGET=180G
SGA_MAX_SIZE=180G
MEMORY_MAX_TARGET=0
MEMORY_TARGET=0
In order to verify that hugepages are being used, run this command:
‘cat /proc/meminfo | grep Huge’.
It will show HugePages_Total, HugePages_Free, and HugePages_Rsvd. The HugePages_Rsvd value is the number of hugepages that are in use..
Note that this example uses Linux hugepage size of 2M (2048k). On Itanium systems the hugepage size is 256M.
These instructions should allow you successfully implement huge pages in Linux. Note that everything would be the same for Oracle 10gR2, with the exception that the USE_LARGE_PAGES parameter is unavailable.
March 15, 2012 at 03:31 |
Great article about how to configure Hugepages for Oracle on Linux. Maybe the best I’ve seen.
Clearly explained with only important information provided.
Regards,
Marko
March 15, 2012 at 08:58 |
Great post… I got performance issue and a Advice directly from alert.
****************** Large Pages Information *****************
Total Shared Global Region in Large Pages = 0 KB (0%)
Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB) (alloc incr 64 MB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB
RECOMMENDATION:
Total Shared Global Region size is 20 GB. For optimal performance,prior to the next instance restart increase the number
of unused Large Pages by atleast 10241 2048 KB Large Pages (20 GB) system wide to get 100% of the Shared Global Region allocated with Large pages
This post helped configure properly my database and O.S to use hugepages.
Thanx a lot
March 15, 2012 at 09:23 |
Thanks for the input, I knew that but didn’t think to put it in my blog. Just remember that that will be the information for the single instance. If you have multiple instances you will have to add up the sga sizes to calculate the size of your hugepage pool.
April 13, 2012 at 13:22 |
I think as of 11.2.0.3, SGA will allocate from std memory when hugepages are exhausted. Sorry, can’t remember the details, but should not be too hard to locate.
April 13, 2012 at 13:31 |
Jared-
This is only partially correct. If there are sufficient hugepages available, it will allocate hugepages. If there are not sufficient hugepages available it will attempt to allocate std memory. But it must be either all hugepages or all standard, it cannot allocate a combination of the two.
April 22, 2012 at 04:38
Andrew,
read again MOS Note1392497.1 about use_large_pages parameter : there seems to have some changes with 11.2.0.3 patchset confirming what Jared wrote above.
Regards
April 23, 2012 at 09:14
Jared and GIles – you are correct, Giles thanks for pointing that note out to me. It looks like in 11.2.0.3 oracle will use a combination of hugepages and standard pages. This strikes me as a bad idea, but it will happen. Jared, my apologies.
April 16, 2012 at 18:29 |
after setting vm.nr_hugepages on my linux server, I’ve had to add another database. Value of vm.nr_hugepages was not high enough to cover the new database. How high can vm.nr_hugepages be set to ? I know vm.nr_hugepages can not max out ram. I’d like to leave some growth space in this value so I dont have to bug my SA to adjust up and reboot.
April 17, 2012 at 11:09 |
vm.nr_hugepages can be set to as much memory as available on the system. However, my general rule of thumb is that on Linux, oracle can use up to 80% of the total system memory, so take 80% of the total memory, subtract the amount of memory you want for the PGA_AGGREGATE_TARGET, and that tells you how much memory you can give to hugepages. One caveat is that I have not experimented with determining if there is an upper limit to how much memory the Linux OS needs. So on VLM systems, it is entirely possible the required OS memory number may be some fixed amount as opposed to 20% of the total system memory. eg, on 256G RAM system, its very possible that Linux needs a flat 25G instead of the 50G or so that my 80% rule of thumb suggests. Do any Linux experts have input?
April 22, 2012 at 04:51 |
Andrew,
you wrote : “I have recently discovered that Oracle, hugepages, and NUMA are incompatible, at least on Linux. NUMA must be disabled to use hugepages with Oracle.”
I found nothing on MOS concerning this incompatiblity.
Have you some references ?
An you disable NUMA at which level ? Linux kernel or Oracle instance parameters ?
—
Regards
April 23, 2012 at 09:16 |
Giles- There is no metalink note on this that I have found. However, when a friend ran into issues with getting hugepages to work, he opened an SR and the first thing Oracle told him to do was to disable NUMA at the OS level. It can also be disabled in the DB, but Oracle didnt mention that.
May 18, 2012 at 12:02 |
I have a system with:
OS -> RHEL 4
DB -> Oracle 10gR2
RAM -> 12GB
cat /proc/cpuinfo | egrep “processor|physical\ id|core\ id|cpu\ cores”
processor : 0
physical id : 1
core id : 16
processor : 1
physical id : 0
core id : 0
processor : 2
physical id : 1
core id : 17
processor : 3
physical id : 0
core id : 1
processor : 4
physical id : 1
core id : 25
processor : 5
physical id : 0
core id : 9
processor : 6
physical id : 1
core id : 26
processor : 7
physical id : 0
core id : 10
processor : 8
physical id : 1
core id : 16
processor : 9
physical id : 0
core id : 0
processor : 10
physical id : 1
core id : 17
processor : 11
physical id : 0
core id : 1
processor : 12
physical id : 1
core id : 25
processor : 13
physical id : 0
core id : 9
processor : 14
physical id : 1
core id : 26
processor : 15
physical id : 0
core id : 10
cat /etc/sysctl.conf
…
kernel.shmmax = 8589934592
kernel.shmall = 3145728
vm.nr_hugepages = 3072
…
cat /etc/security/limits.conf
…
oracle soft memlock 4194304
oracle hard memlock 4194304
…
init.ora
…
*.sessions=500
…
*.pga_aggregate_target=1342177280
*.sga_max_size=8589934592
*.sga_target=6442450944
…
cat /proc/meminfo | grep Huge
HugePages_Total: 3072
HugePages_Free: 3072
Hugepagesize: 2048 kB
What more I should configure to use Huges Pages?
May 20, 2012 at 01:38 |
I would check your math, compare your hugepages to your memlock settings.
August 30, 2012 at 16:23 |
Thanks, great article.
Just thought the following point didn’t clearly come out in the article.
There’s no need to increase the shmall and shmmax, ***if***:
(1) the instances are already configured and running in the server
***AND***
(2) the SGA for any of the DBs are not increased (meaning, one is just configuring those instances to use hugepages instead of the default pagesize)
February 21, 2013 at 01:10 |
This is awesome article. We are considering huge pages wwith sga_max and sga_target, and would kick memory_target and _max_size in favour of thisthen on so called shared clusters, where various projects will add their dbs. Only concern i now have is that it will b mix of smal-medium and large. And since they will come from new projects no clue yet what largest sga_target will b. But that will b challenge then.
Ty for sharing this post.
April 23, 2013 at 14:18 |
Hmm, @dbakerber, your formula seems to contradict with “Oracle Database 11g Release 2 on Red Hat Enterprise Linux 6 Deployment Recommendations” document, pages 13 and 14.
Either you or that document seems to be confusing ‘page size’ and ‘hugepage size’…Or I’m confusing reading both of these things.
April 23, 2013 at 14:34 |
What formula are you talking about?
April 23, 2013 at 14:43
quote from that doc:
“The maximum size of a shared segment (shmmax) should be one-half the size of total memory. So for our 96 GB example, this parameter should be set to 96*1024^3/2 or 51,539,607,552; for 16 GB, shmmax, is proportionally smaller: 16*1024^3/2 = 8,589,934,592.”
April 23, 2013 at 14:46
This document is located at this ugly url http://www.redhat.com/rhecm/rest-rhecm/jcr/repository/collaboration/jcr:system/jcr:versionStorage/ee6fe0000a0526020f35498ae39e9939/12/jcr:frozenNode/rh:resourceFile
April 23, 2013 at 15:36
It is my understanding that shmmax sets the max size of a single shared memory segment, so the actual value is not critical as long as you dont try and allocate more memory than is on the server. I dont think this document was available when I set this value, and as I recall I used Oracle documentation to determine the shmmax value. So, I expect there is more than one way to calculate this value.
May 18, 2013 at 01:28 |
The only numa option for LInux of which I am aware are for the kernel cmd line and there numa=off directs a completely numa Unaware policy for allocating memory to processes, and scheduling processes irrespective of the socket on which the related entities reside. In other words maximum numa-ness. Setting numa=on directs the kernel (if compiled with these features) to be more numa-aware and thus reduce cross socket traffic. Its a bit hard to see how numa=on would adversely impact using huge pages let alone cause Oracle to malfunction or not start or whatever the issue was.
So I’d have to get more info before accepting that numa being enabled was really the problem.
May 19, 2013 at 07:18 |
My understanding of NUMA extends not much further than the translation of the acronym, however as noted earlier Oracle did ask that NUMA be turned off as one of the first steps in debugging problems with hugepages. Later, I also ran into problems and discovered that disabling NUMA fixed them. If, as you say, it doesnt actually disable NUMA but changes the way NUMA is handled, it could be that the way Oracle on Linux handles the setting is the problem.
November 22, 2013 at 07:39
Anyway, at least in 11.2.0.3, numa is disabled by default at the Oracle level as “_enable_NUMA_support” is set to FALSE.
September 5, 2013 at 10:25 |
You saved my day!!!!
May 1, 2014 at 11:16 |
I noticed your setting in the limits.conf as
oracle soft memlock 230000000
oracle hard memlock 230000000
instead of based on Oracle support’s doc on Huge Page
* soft ..
* hard ..
Which one is correct or does it really matter?
May 1, 2014 at 11:40 |
It doesnt really matter. Its just a limit. You might want to have a limit, you might not.
May 1, 2014 at 12:38 |
“* soft|hard …” means all users
“oracle soft|hard …” means these limits apply only to the user “oracle”.
May 1, 2014 at 12:56
Oh, sorry. I am not familiar with that particular syntax. But I think the Oracle doc is wrong if that is what it says. The owner of the Oracle database should be the only significant user of resources on the server. You pay Oracle licenses based on server horsepower, if another user is taking significant resources you end up paying Oracle for licensing you cannot use.
May 1, 2014 at 13:07
Well, that’s totally up to you/dba/ops/server owner to decide how that server is going to be used.
That info is not “wrong”.
It assumes that since Oracle DB is the only significant process on that server, you might as well set it to *.
Also newer version of ORacle DB might be using a different user than “oracle”..so setting it to * ensures that the whole system can use huge pages.
May 1, 2014 at 14:20 |
That’s what I thought (* means default entry – all users), but setting the hugetlb_shm_group to oracle group GID in the proc/sys/vm/hugetlb_shm_group file will ensure huge page is only used by the oracle group.