Monday, March 28, 2011

Installing Oracle Database 11.2 on IBM AIX 6.1 (Includes HACMP)

SWAP space recommendation from Oracle corp. for Oracle 11g Release 2

If RAM is between 1 GB and 2 GB, SAWP should be 1.5 times the size of RAM
If RAM is between 2 GB and 16 GB, SWAP should be equal to the size of RAM
If RAM is more than 16 GB, SWAP should be 16 GB

Check RAM
# /usr/sbin/lsattr -E -l sys0 -a realmem

Check swap space
# /usr/sbin/lsps -a


Operating System Requirements
The following, or later versions, of the operating systems are required for Oracle Database 11g Release 2 (11.2):
- AIX 5L V5.3 TL 09 SP1 ("5300-09-01"), 64 bit kernel
- AIX 6.1 TL 02 SP1 ("6100-02-01), 64-bit kernel

The following operating system filesets are required for AIX 6.1:
- bos.adt.base
- bos.adt.lib
- bos.adt.libm
- bos.perf.libperfstat - 6.1.2.1 or later
- bos.perf.perfstat
- bos.perf.proctools
- rsct.basic.rte
- rsct.compat.clients.rte
- xlC.aix61.rte -10.1.0.0 or later

1. To determine the distribution and version of AIX installed, enter the following command:

# oslevel -s

For AIX 6.1: If the operating system version is lower than AIX 6.1 Technology Level 2 SP 1, then upgrade your operating system to this, or a later, level.

2. To determine whether the required filesets are installed and committed, enter a command similar to the following:
# lslpp -l bos.adt.base bos.adt.lib bos.adt.libm bos.perf.perfstat \
bos.perf.libperfstat bos.perf.proctools rsct.basic.rte rsct.compat.clients.rte \
xlC.aix61.rte

3. To determine the supported kernel mode, enter a command similar to the
following:
# getconf KERNEL_BITMODE
64


The following, or later, patches are required for Oracle Database 11g Release 2 (11.2) for AIX Based Systems:


Authorized Problem Analysis Reports (APARs) for AIX 6L:
- IZ41855
- IZ51456
- IZ52319


# /usr/sbin/instfix -i -k "IZ51456 IZ52319 IZ41855"
Note: IZ52319 and IZ41855 might not displayed because these are included in Isome other installed APARs


Create OS Level Groups and User
  1. Check all OS requirements are met.
  2. Add dba OS level groups using “smit security” command. For HACMP, use “smit hacmp” to add the shared group for the cluster
  3. Add oracle user using “smit security” command. Select dba as Primary group for this user and /home/oracle as home directory. For HACMP, use “smit hacmp” to add the shared user for the cluster. Home directory is recommended to be on the local storage of the node.
  4. Reset the password for oracle user. For HACMP, use “smit security” to reset password for both nodes in HACMP.

Configure Shell Limits (Copied from Oracle documentation)
  1. Run “smit chuser” command and select “oracle” user to set the limits as follows.

Shell Limit (As Shown in smit)                Recommended Value
Soft FILE size                                                    -1 (Unlimited)
Soft CPU time                                                   -1 (Unlimited) Note: This is the default value.
Soft DATA segment                                           -1 (Unlimited)
Soft STACK size                                                -1 (Unlimited)
Soft Real Memory size                                       -1 (Unlimited)
Processes (per user)                                          -1 (Unlimited)
Note: This limit is available only in AIX 6.1 or later.
Refer to “Configure System Configuration Parameters” step for information on configuration of processes per user limits.

Edit /etc/security/limits file to set “rss = -1” under user “oracle” to set the “Soft Real Memory Size”. This file should have entries for “oracle” user as follows
oracle:
        fsize = -1
        data = -1
        stack_hard = -1
        stack = -1
        rss = -1

Configure System Configuration Parameters
Parameter                                                     Recommended Value
maxuprocs                                                        16348
ncargs                                                              128

  1. Execute “smit chgsys” command and make sure that value in “Maximum number of PROCESSES is equal or greater to the value of “maxuprocs” mentioned above. Also make sure that value in “ARG/ENV list size in 4K byte blocks is greater or equal to the value of “ncarg” mentioned above.
  2. Login as user “oracle” and execute command “ulimit -a” and confirm that limits have been set mentioned under “Configure Shell Limits.”

Checking Asynchronous Input Output Processes(Copied from IBM Document)

On AIX 6, the AIO device drivers are enabled by default. Increase the number of aioserver processes from the default value. The recommended value for aio_maxreqs is 64k (65536). Confirm this value.
Confirm the aio_maxreqs value using the procedure for your release:
AIX 6.1:
# ioo –o aio_maxreqs
aio_maxreqs = 65536

When performing an asynchronous I/O to a file system, note that each asynchronous I/O operation is tied to an asynchronous I/O server. Thus, the number of asynchronous I/O servers limits the number of concurrent synchronous I/O operations in the system.

The initial number of servers that are started during a system restart is determined by the minservers parameter. As concurrent asynchronous I/O operations occur, additional asynchronous I/O servers are started, up to a maximum of the value set in the maxservers parameter.

In general, to set the number of asynchronous I/O servers, complete the following procedure:
1. Adjust the initial value of maxservers to 10 times the number of disks that are to be used concurrently but no more than 80.
2. Monitor the performance effects on the system during periods of high I/O activity. If all AIO server processes are started, then increase the maxservers value. Also, continue to monitor the system performance during peak I/O activity to determine if there was a benefit from the additional AIO servers. Too many
asynchronous I/O servers increase memory and processor overload of additional
processes, but this disadvantage is small.

To monitor the number of AIO server processes that have started, enter the following:
# ps -ek|grep -v grep|grep –v posix_aioserver|grep -c aioserver
Note: Value could be “0” if there is not activity going on to the server


Tuning the AIX for Oracle
  1. Execute following commands while logged in as root user, for more details about this, see document “Tuning IBM AIX 5.3 and AIX 6.1 for Oracle Database.pdf”
# vmo -L lgpg_size 
# vmo -L lgpg_regions
# vmo -p -o lgpg_size=16777216  -o lgpg_regions=256
# vmo -L lgpg_size 
# vmo -L lgpg_regions
# vmo -p -o lru_file_repage=0 -o minperm%=3 -o maxperm%=90 -o  maxclient%=90
# vmo -r -o page_steal_method=1  
# bosboot -ad /dev/ipldevice
  1. Reboot the server

Miscellaneous Settings
  1. Check /etc/hosts file. It should contain the host name and alias in it.
  2. /etc/resolv.conf file should contain following entry

    nameserver      168.232.220.220
    domain  mydomain.com
  3. Check /etc/netsvc.conf. This file should contain following entry to make sure that /etc/hosts file is used first for name resolution

    host = local, bind, nis
  4. Set the xntpd for network time synchronization.
    1. Make entry of time serve in the /etc/ntp.conf as follows
      server oa-10.utac.com.sg prefer
    2. Use “smit xntpd” to start xntpd service.


Installation of Oracle 11G Release 2

  1. Set the environment variables for user oracle in .profile which could be found in /home/oracle directory. For HACMP, do this on both nodes.
    ###################################################
    ORACLE_HOSTNAME=ibmaixha01
    ORACLE_BASE=/oracle;export ORACLE_BASE
ORACLE_HOME=/oracle/product/11.2.0.1/db_1;export ORACLE_HOME
ORACLE_SID=mydb;export ORACLE_SID
ORACLE_TERM=vt100; export ORACLE_TERM
NLS_LANG=american_america.UTF8; export NLS_LANG
ORA_NLS33=$ORACLE_HOME/ocommon/nls/admin/data; export ORA_NLS33
LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME/lib32:/lib:/usr/lib:/usr/local/lib
export LD_LIBRARY_PATH
PATH=$ORACLE_HOME/bin:$PATH:$HOME/bin
set -o vi###################################################
  1. Run the rootpre.sh script and root user. For HACMP, execute this script on both nodes of HACMP. This will also create a group hagsuser, add user oracle and root into this group on both nodes.
    Note: Do not run the rootpre.sh script if you have a later release
of the Oracle Database software already installed on this system.
Switch user to root:
$ su -
password:
#
Complete one of the following steps, depending on the location of the installation
files:
·        If the installation files are on disc, enter a command similar to the following, where directory_path is the disc mount point directory or the path of the db directory on the DVD:
# /directory_path/rootpre/rootpre.sh
·        If the installation files are on the hard disk, change directory to the Disk1 directory and enter the following command:
# ./rootpre.sh


  1. Set ORACLE_HOSTNAME environment variable appropriately if required.
  2. Login as oracle user and check if all environment variables are set correctly.
  3. Set DISPLAY environment variable if required.
  4. Run Oracle installer from the directory where software is placed.
    $ ./runInstaller
    Uncheck the box for security updates and click Next.
  5. Select “Install database software only”, and click Next.
  6. Select “Single database installation”, and click Next.
  7. Select appropriate option for installation and click Next.
  8. Enter ORACLE_BASE and ORACLE_HOME directories and click Next.
  9. Specify oracle inventory location as mentioned bellow and click Next.
  10. Click Next.
  11. Click on Finish to start installation
  12. Following screen shows the installation progress.
  13. Run “orainstRoot.sh” and then “root.sh” as root.
  14. Click Close to finish the installation.

21 comments:

  1. Hi Salman,
    Thanks for the post.
    please let me know the procedure for upgrading from 10g to 11g in AIX 6.1 HACMP

    ReplyDelete
  2. Thanks for your comment. I am sorry but I don't have any of my personal document of upgrade from 10g to 11.2. You can give try to the standard upgrade documet on your test environment first.
    Thanks

    ReplyDelete
  3. Hi Salman,
    Nice post. I am in the process of Installing Oracle 11.2.0.3 on HACMP, so your document seems to be of major assistance. It is indeed my first time so I have some basic questions. I am waiting for the AIX admin to complete the HACMP install. My question is how stable is HACMP in terms of node failures for the Oracle DB?
    Do we install Oracle on both nodes? How do we startup the Oracle DBA during active node failure?
    Thanks!

    ReplyDelete
  4. Hi "Unknown" :)
    I have seen IBM Power Series Servers with AIX the best in stability. Currently I am in another company, and worked on AIX in my previous company, but, never saw any failure which could have done the migration of resources to the second node of cluster. But you should always be ready for the worst.
    You will be installing the Oracle on one node only but you should set the environment on both nodes same (.profile file of oracle user and any other backup scripts etc). Installation of software should be done on the file system which is on the shared storage and will be mounted on the second node in case of failover. So all your software file system and database file filesystem will be mounted to the second node automatically.

    ReplyDelete
  5. I have re installed the software on the shared disk but the db start and dbshut script does not execute accordingly.

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. You should keep following things in mind
    1) Test your script by executing it at command prompt and see if it works 100% fine and startsup/shuts down the oracle resources.
    2) There are log files of AIX HACMP. Do a test failover and during this process, monitor the HACMP log and see if there is any error message in the log files when it executed the oracle scripts.
    3) You need to set all environment variables for oracle, correctly in your scripts

    Salman

    ReplyDelete
  8. Thanks, Salman!
    I have taken the start/shut scripts from $ORACLE_HOME/bin/dbstart and dbshut accordingly. Manually it runs 100% fine. But it does not work during the test failover. I have installed the software on the shared disk, setup the .profile file correctly on both nodes. From what I understand is, if I failover, the new node must start the instance as well as the listener as soon as the shared disk stabilises.

    Regards
    Mohammed

    ReplyDelete
  9. Hi Muhammad,
    I never used this script. You should write your own script which is very simple and will look like the following

    ORACLE_HOME=/u01/app/oracle/product/11.20/db_1; export ORACLE_HOME
    PATH=/u01/app/oracle/product/11.20/db_1/bin; export PATH:$PATH
    sqlplus /nolog << EOF
    CONNECT '/ as sysdba'
    startup
    exit
    EOF

    lsnrctl start

    ReplyDelete
  10. Great. Thats a simple one. Now my listener is configured with the cluster service IP and I have configured my windows client with tns entries with the same service IP but it fails to connect.

    ReplyDelete
  11. Try to provide as much details as you can. Are you trying to connect after fail over or before failover? If before failover, what is the error message. If after failover, did you try if listener is up and what is its status and what is error message at client side?

    ReplyDelete
  12. Hello Salman, Sorry was busy with some other projects. Actually, this is out first time installing Oracle single node on a HACMP cluster. Me as well as our AIX admin aint sure how exactly it works. Probably, our architect has done a mess up as I feel HACMP is not at all required when we can have Oracle RAC for high availability. Anyway, the design is approved now so we need to configure it successfully. My understanding is that when a node of AIX HACMP has some issue, it fails over to the secondary node shutting down the DB instance as well as the listener and via the auto startup script, the secondary node must bring up the instance as well as listener. Now, What about the client machines trying to connect to the DB server, do we need to change their IPs each time since both AIX nodes have their own respective IPs. Or is it fine if we configure the client tns entries with the Cluster IP or Service label IP? Please advise appropriately.

    Regards
    Mohammed

    ReplyDelete
  13. You are absolutely right.
    1. In HACMP, Primary node executes some scripts before failover which should be used to shutdown the database and listener and during failover, shared file systems (hosting ORACLE_HOME and Oracle files datafiles/controlfile/redolog files)
    2. Backup node executes scripts to start the database and listener. Here listener is listening on the service IP of HACMP which is failed over to this backup node. So on client side, you should use the IP where listener is listening which is the service IP of HACMP and will remain available even after the failover.

    Salman

    ReplyDelete
  14. Cool, now my problem is as of below:
    Node1 = 10.154.128.30
    Node2 = 10.154.128.31
    Cluster Service IP = 10.154.128.32

    I am able to login to the active node via the cluster service IP. I have configured the DB listener using the cluster service IP. I am able to tnsping the cluster service IP (configured on tnsnames.ora entry) from any of the client machines. BUT I am not able to connect to the Database using the cluster service IP.

    # su - oracle
    $ sqlplus /nolog

    SQL*Plus: Release 11.2.0.3.0 Production on Wed Jul 11 11:27:29 2012

    Copyright (c) 1982, 2011, Oracle. All rights reserved.

    SQL> conn sys/********@ppccad as sysdba
    ERROR:
    ORA-12514: TNS:listener does not currently know of service requested in connect
    descriptor


    TNSNAMES.ORA contents.............

    ppccad =
    (DESCRIPTION =
    (LOAD_BALANCE=ON)
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 10.154.128.32)(PORT = 1521))
    )
    (CONNECT_DATA =
    (SERVER = DEDICATED)
    (SERVICE_NAME = ppccad)
    (FAILOVER_MODE=(TYPE=select)(METHOD=basic))
    )
    )

    I am able to connect to the DB instance when I change the tnsnames.ora entry in the client machines to the specific node IP but not the cluster service IP. What could be the reason?

    ReplyDelete
  15. Show me your listener.ora entries. Can you also try to have a static registration of your database with the listener while using cluster IP and see how it works.
    Salman

    ReplyDelete
  16. I have configured everything now and the only problem now is that I am unable to connect the oracle user as sysdba from remote client machines. Remote_login_password parameter is set to 'exclusive' and also oracle OS user is under DBA group.

    ReplyDelete
  17. Whenever you put any questions, try to explain it fully with telling how/what you are doing and what is the error message you are receiving.
    Salman

    ReplyDelete
  18. Brother Salman. Finally, everything has been sorted out. The issue was with the password file name and thats the reason remote machine was not able to connect as sysdba.
    Thank you so much! Really appreciate all your help and assistance.

    ReplyDelete
  19. First of all hats to your attitude towards helping other without any reason.
    We are right now en counting situation in production environment, at the time of batch processing(Multiple batch DB sessions) running someday we are getting ORA-01013: user requested cancel of current operation error.
    It is still under analysis phase by DBA and IBM guys. we are using dedicated server. My doubt with problem is, it is because of number server processes sudden increased caused running DB process KILL event.
    Observation part
    (1) Number of sessions including INACTIVE session get increased to more then 4000 at the time of problem occurring, Normal Inactive and active session count is less then 3000.
    (2) Node 1 is less utilized compare to Node 2 of RAC database.
    (3) GRID SWAP is observed in database.
    my doubt is with even thought availability of good RAM and servers, due to not proper allocation of SGA and PGA distribution and due to number of processes get increased causing session kill to happen at OS level.
    Can you provide your valuable feedback with above situation in order to identify and resolve problem ?

    ReplyDelete
  20. There is nothing like "DB process KILL event".
    If your RAM/SWAP depletes, your sessions/database will not respond and hangour and this may make your application to issue a timeout at application and abort. So probably you can check timeout setting of your application for this.
    Again note that no session can be killed automatically by database or OS unless you specify some program/timeout to do this.
    Salman

    ReplyDelete