在以前的一篇文章中,我提到千万不能将Oracle数据库的global_name更新为空。这不,事儿来了。我的一个同事,提到了一个解决办法,不过那个办法实际上是一种不完全恢复的办法,如果没有备份,就行不通。如果没有备份,可以使用BBED来修改块来解决这个问题,不过使用bbed仍然比较麻烦。
下面是我一时心血来潮进行的一次测试。测试环境,10.2.0.4 for Linux AS 5.5。注意,不要在生产库上模仿。
首先UPDATE GLOBAL_NAME为空,COMMIT后以abort方式关闭数据库,以abort方式只是为了增加点难度。之后再启动数据库。
SQL> update global_name set global_name=''; 1 row updated. SQL> commit; Commit complete. SQL> shutdown abort ORACLE instance shut down. SQL> startup ORACLE instance started. Total System Global Area 209715200 bytes Fixed Size 1266632 bytes Variable Size 75500600 bytes Database Buffers 130023424 bytes Redo Buffers 2924544 bytes Database mounted. ORA-01092: ORACLE instance terminated. Disconnection forced
启动失败,不出意料出现ORA-600 [18062]错误:
Mon Sep 6 15:43:31 2010 Errors in file /oracle/app/oracle/admin/xty/udump/xty_ora_3149.trc: ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], [] QMNC started with pid=16, OS id=3151 Mon Sep 6 15:43:33 2010 Errors in file /oracle/app/oracle/admin/xty/udump/xty_ora_3149.trc: ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], [] Mon Sep 6 15:43:33 2010 Errors in file /oracle/app/oracle/admin/xty/udump/xty_ora_3149.trc: ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
下面来解决这个问题。
第1步,重启数据库到MOUNT状态:
[oracle@xty ~]$ sqlplus / as sysdba SQL*Plus: Release 10.2.0.4.0 - Production on Mon Sep 6 15:43:47 2010 Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Connected to an idle instance. SQL> startup mount ORACLE instance started. Total System Global Area 209715200 bytes Fixed Size 1266632 bytes Variable Size 75500600 bytes Database Buffers 130023424 bytes Redo Buffers 2924544 bytes Database mounted.
第2步,在另一个窗口中,使用gdb
[oracle@xty ~]$ ps -ef | grep LOCAL oracle 3186 3156 0 15:43 ? 00:00:00 oraclexty (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) oracle 3188 2978 0 15:44 pts/3 00:00:00 grep LOCAL [oracle@xty ~]$ gdb $ORACLE_HOME/bin/oracle 3186 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5) ...(省略)... Reading symbols from /oracle/app/oracle/product/10.2.0/bin/oracle...(no debugging symbols found)...done. Attaching to program: /oracle/app/oracle/product/10.2.0/bin/oracle, process 3186 Reading symbols from /etc/libcwait.so...(no debugging symbols found)...done. Loaded symbols for /etc/libcwait.so Reading symbols from /oracle/app/oracle/product/10.2.0/lib/libskgxp10.so...(no debugging symbols found)...done. Loaded symbols for /oracle/app/oracle/product/10.2.0/lib/libskgxp10.so Reading symbols from /oracle/app/oracle/product/10.2.0/lib/libhasgen10.so...(no debugging symbols found)...done. ...(省略)... Reading symbols from /oracle/app/oracle/product/10.2.0/lib/libnnz10.so...(no debugging symbols found)...done. Loaded symbols for /oracle/app/oracle/product/10.2.0/lib/libnnz10.so Reading symbols from /usr/lib/libaio.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libaio.so.1 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2 0x0025d402 in __kernel_vsyscall () (gdb) break kokiasg Breakpoint 1 at 0xa3d404d (gdb) continue Continuing.
第3步,OPEN数据库:
SQL> alter database open;
约等一会儿,在alert日志里面可以看到:
Mon Sep 6 15:44:41 2010 SMON: enabling cache recovery SMON: enabling tx recovery Mon Sep 6 15:44:41 2010 Database Characterset is ZHS16GBK
在gdb的输出可以看到:
Breakpoint 1, 0x0a3d404d in kokiasg ()
第4步,在gdb那里中止OPEN:
(gdb) kill Kill the program being debugged? (y or n) y (gdb) quit
sqlplus会提示:
alter database open * ERROR at line 1: ORA-03113: end-of-file on communication channel
这一次,Instance并没有terminated。只是Server process被KILL了。
第5步,还原GLOBAL_NAME:
[oracle@xty ~]$ sqlplus / as sysdba SQL*Plus: Release 10.2.0.4.0 - Production on Mon Sep 6 15:45:09 2010 Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> select status from v$instance; STATUS ------------ OPEN SQL> update global_name set global_name='XTY'; 1 row updated. SQL> commit 2 ; commit * ERROR at line 1: ORA-01109: database not open
虽然可以执行UPDATE,但是不能COMMIT。再试试能不能做DDL:
SQL> create table t1 ( a int); Table created.
成功了。
在另会一个会话中查看GLOBAL_NAME:
SQL> select * from global_name; GLOBAL_NAME -------------------------------------------------------------------------------- XTY
GLOBAL_NAME回来了。
这里通过DDL的隐式提交特性来UPDATE GLOBAL_NAME。其实还有更简单的办法:OCI主动断开连接时的自动提交。如果UPDATE之后,直接退出sqlplus,UPDATE GLOBAL_NAME的事务实际也提交了。看起来ORACLE这时只是不能执行显式的COMMIT语句。
第6步,重启数据库:
SQL> shutdown immediate ORA-00604: error occurred at recursive SQL level 1 ORA-01109: database not open SQL> shutdown abort ORACLE instance shut down. SQL> startup ORACLE instance started. Total System Global Area 209715200 bytes Fixed Size 1266632 bytes Variable Size 75500600 bytes Database Buffers 130023424 bytes Redo Buffers 2924544 bytes Database mounted. Database opened.
看起来只能以abort方式关闭数据库:
成功了,数据库起来了。没有数据丢失,没有使用备份。不需要基于时间点的恢复,不需要BBED,不需要Resetlog。
“看起来只能以abort方式关闭数据库:”这句话何解?
[回复]
老熊 回复:
9月 6th, 2010 at 5:25 下午
@boypoo, shutdown immediate那一句报了ORA-00604错误,shutdown失败呀。
[回复]
我测试了一下。用bbed恢复不了。大师能否把bbed也写出来。谢谢!!
[回复]
能ddl不就已经提交了吗?还需要做ddl trigger吗?
[回复]
老熊 回复:
9月 7th, 2010 at 10:15 上午
@koko, 对,你说的很对。
[回复]
在aix平台上,用那个工具代替gdb,你的步骤是基于open状态做的,现实是在aix平台上一旦设置global_name为空后,再也无无法打开了。也就无法执行update操作了。。
[回复]
老熊 回复:
9月 7th, 2010 at 10:43 上午
@wenxing, 在AIX上用dbx。关于你说的这个问题,请仔细阅读文章
[回复]
“我的一个同事,提到了一个解决办法,不过那个办法实际上是一种不完全恢复的办法,如果没有备份,就行不通。”
我尝试了一下这个方法,在没有备份的情况下,也是可以打开的。
不过我想不通,数据库还可以恢复到比当前SCN小的scn,怪?
[回复]
老熊 回复:
9月 7th, 2010 at 10:57 上午
@wenxing, 那种在UPDATE之后的块没有写到数据文件上就可以实现。
[回复]
在执行update global_name set global_name=”后
commit;
这个时候后台报600 18062,数据库没有shutdown,
但是已经无法登录了:
SQL>
SQL> conn / as sysdba
ERROR:
ORA-01075: you are currently logged on
Warning: You are no longer connected to ORACLE.
然后shutdwon后,在start mount
再open时,实例直接shutdown
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
我不知道是否你使用了gdb的关系导致数据库可以open起来,我去尝试一下。。
[回复]
老熊 回复:
9月 7th, 2010 at 11:01 上午
@wenxing, 这个测试最关键的地方就是使用gdb,使数据库能够处于open状态。
[回复]
果然可以,把在aix平台上的结果贴一下,感谢laoxiang,顺便问一下,为啥使用gdb,dbx就可以open起来?
在第一个窗口执行:
SQL> startup mount;
ORA-32004: obsolete and/or deprecated parameter(s) specified
ORACLE instance started.
Total System Global Area 1610612736 bytes
Fixed Size 2073192 bytes
Variable Size 654314904 bytes
Database Buffers 939524096 bytes
Redo Buffers 14700544 bytes
Database mounted.
在第二个窗口执行:
oracle@ccicdbsrv2:/oracle/admin/ccicdev/bdump/ps -ef | grep LOCAL|grep ccicdev
oracle 1687838 2371748 0 11:07:55 – 0:00 oracleccicdev (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle@ccicdbsrv2:/oracle/admin/ccicdev/bdump/dbx -a 1687838
Waiting to attach to process 1687838 …
Successfully attached to oracle.
回到第一个窗口执行:
Database mounted.
SQL> alter database open;
hang住了。。
新开第三个窗口:
SQL> conn / as sysdba
Connected.
SQL> select status from v$instance;
STATUS
————
OPEN
SQL> conn / as sysdba
Connected.
SQL> select status from v$instance;
STATUS
————
OPEN
SQL> update global_name set global_name=’CCICDEV';
1 row updated.
SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – 64bit Production
With the Partitioning, OLAP and Data Mining options
oracle@ccicdbsrv2:/home/oracle/sqlplus /nolog
SQL*Plus: Release 10.2.0.3.0 – Production on Tue Sep 7 11:20:13 2010
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
SQL> conn / as sysdba
Connected.
SQL> select global_name from global_name;
GLOBAL_NAME
——————————————————————————–
CCICDEV
再中断dbx进程后,第一个窗口:
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
SQL> SQL> SQL> SQL
[回复]
“那种在UPDATE之后的块没有写到数据文件上就可以实现。”
我是这样测试的:
SQL>update global_name set global_name=”;
SQL>commit;
在alert日志中:
Errors in file /oracle/admin/ccicdev/udump/ccicdev_ora_536662.trc:
ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
然后我shutdown immediate–这个时候数据库应该写回到物理文件了吧?
–之后的步骤都是按照文全的步骤做的。。
[回复]
老熊 回复:
9月 8th, 2010 at 5:25 下午
@wenxing, shutdown immediate方式,内存中的数据已经写回到了数据文件。
[回复]
老熊:
我按照你的步骤在aix平台操作,发现数据open后又报ora-600 18062错误,接着还是shutdown了,但是他可以自动mount,你那里的环境有问题吗?
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=24, OS id=356806
Tue Sep 14 09:42:59 2010
Thread 1 advanced to log sequence 5
Thread 1 opened at log sequence 5
Current log# 1 seq# 5 mem# 0: /oradata/ccicdev/redo01.log
Successful open of redo thread 1
Tue Sep 14 09:42:59 2010
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Tue Sep 14 09:42:59 2010
ARC1: Becoming the ‘no FAL’ ARCH
ARC1: Becoming the ‘no SRL’ ARCH
Tue Sep 14 09:42:59 2010
ARC0: Becoming the heartbeat ARCH
Tue Sep 14 09:42:59 2010
SMON: enabling cache recovery
Tue Sep 14 09:43:00 2010
Successfully onlined Undo Tablespace 5.
Tue Sep 14 09:43:00 2010
SMON: enabling tx recovery
Tue Sep 14 09:43:00 2010
Database Characterset is AL32UTF8
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=25, OS id=2891958
Tue Sep 14 09:43:02 2010
Errors in file /oracle/admin/ccicdev/udump/ccicdev_ora_2293938.trc:
ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
Tue Sep 14 09:43:03 2010
Errors in file /oracle/admin/ccicdev/udump/ccicdev_ora_266640.trc:
ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
Tue Sep 14 09:43:03 2010
Errors in file /oracle/admin/ccicdev/udump/ccicdev_ora_2293938.trc:
ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
Tue Sep 14 09:43:03 2010
Errors in file /oracle/admin/ccicdev/udump/ccicdev_ora_2293938.trc:
ORA-00600: internal error code, arguments: [18062], [], [], [], [], [], [], []
Tue Sep 14 09:43:03 2010
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 2293938
ORA-1092 signalled during: alter database open…
Tue Sep 14 09:47:12 2010
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
IMODE=BR
ILAT =97
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.3.0.
System parameters with non-default values:
processes = 800
sessions = 885
__shared_pool_size = 620756992
__large_pool_size = 16777216
__java_pool_size = 16777216
__streams_pool_size = 0
sga_target = 1610612736
control_files = /oradata/ccicdev/control01.ctl, /oradata/ccicdev/control02.ctl, /oradata/ccicdev/control03.ctl
db_block_size = 8192
__db_cache_size = 939524096
compatible = 10.2.0.3.0
log_archive_dest_1 = LOCATION=/oradata/archlog/ccicdev
log_archive_format = %t_%s_%r.dbf
db_file_multiblock_read_count= 16
_allow_resetlogs_corruption= TRUE
_allow_terminal_recovery_corruption= TRUE
_corrupted_rollback_segments= _SYSSMU1
undo_management = AUTO
undo_tablespace = UNDOTBS2
remote_login_passwordfile= EXCLUSIVE
db_domain =
dispatchers = (PROTOCOL=TCP) (SERVICE=ccicdevXDB)
job_queue_processes = 10
background_dump_dest = /oracle/admin/ccicdev/bdump
user_dump_dest = /oracle/admin/ccicdev/udump
core_dump_dest = /oracle/admin/ccicdev/cdump
audit_file_dest = /oracle/admin/ccicdev/adump
db_name = ccicdev
open_cursors = 800
sql_trace = FALSE
pga_aggregate_target = 821035008
Deprecated system parameters with specified values:
sql_trace
End of deprecated system parameter listing
PMON started with pid=2, OS id=3428352
PSP0 started with pid=3, OS id=2695392
MMAN started with pid=4, OS id=1462656
DBW0 started with pid=5, OS id=2891966
LGWR started with pid=6, OS id=2662642
CKPT started with pid=7, OS id=2576400
SMON started with pid=8, OS id=1810502
RECO started with pid=9, OS id=1413338
CJQ0 started with pid=10, OS id=1220834
MMON started with pid=11, OS id=127044
Tue Sep 14 09:47:12 2010
starting up 1 dispatcher(s) for network address ‘(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))’…
MMNL started with pid=12, OS id=4493448
Tue Sep 14 09:47:13 2010
starting up 1 shared server(s) …
Tue Sep 14 09:47:13 2010
ALTER DATABASE MOUNT
Tue Sep 14 09:47:17 2010
Setting recovery target incarnation to 5
Tue Sep 14 09:47:17 2010
Successful mount of redo thread 1, with mount id 2879843617
Tue Sep 14 09:47:17 2010
Database mounted in Exclusive Mode
Completed: ALTER DATABASE MOUNT
[回复]
如果你操作正常,下面一行应该不会出现:
replication_dependency_tracking turned off (no async multimaster replication found)
[回复]
solaris10下测了一下,报错,哪里出了问题?
-bash-3.00$ ps -ef|grep LOCAL
oracle 4100 4069 0 15:04:05 ? 0:00 oracletjdrms14 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 4105 3986 0 15:04:49 pts/4 0:00 grep LOCAL
-bash-3.00$ gdb $ORACLE_HOME/bin/oracle 4100
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “sparc-sun-solaris2.10″…
(no debugging symbols found)
Attaching to program `/export/home/orahome/oracle/product/10.2.0/bin/oracle’, process 4100
/proc/4100: Value too large for defined data type.
do_attach: couldn’t save traced faults.
/export/home/orahome/oracle/4100: No such file or directory.
[回复]
老熊 回复:
9月 19th, 2010 at 4:28 下午
@jyc, 这应该是gdb的问题。
This GDB was configured as “sparc-sun-solaris2.10″…
看上去是编译成32位的。
应该用64位的gdb
[回复]
我在fedora 12上用oracle 11.2.0.1的版本测试的时候,
gdb:
(gdb)break kokiasg
Breakpoint 1 at 0x8c8ea13
(gdb) continue
Continuing.
Breakpoint 1, 0x08c8ea13 in kokiasg ()
alter中:
SMON: enabling tx recovery
Archived Log entry 178 added for thread 1 sequence 278 ID 0x4a7839fd dest 1:
Database Characterset is ZHS16GBK
这时候如果在gdb中kill
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) quit
alter.log中:
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Tue Oct 12 15:23:51 2010
PMON (ospid: 22523): terminating the instance due to error 397
Instance terminated by PMON, pid = 22523
Instance看起来也被中断了。
这时候sqlplus 是
SYS@orcl>select status from v$instance;
select status from v$instance
*
ERROR at line 1:
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 0 Serial number: 0
sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Tue Oct 12 15:28:00 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SYS@orcl>
上面如果没有在gdb中kill,设置了break后,alter database open,执行后hang住,从另外的窗口中select status from v$instance的时候是open的,再执行update成功。
这是oracle 版本的差别么,即10.2版本中这时候只中断Server process,而11.2中instance也terminated?还是我执行的问题?
[回复]
老熊 回复:
10月 12th, 2010 at 4:12 下午
这毕竟是比较internal的东西,Oracle官方是不支持的。随着版本的不同,内部行为肯定有所变化。
[回复]
break kokiasg
kokiasg是什么意思?怎么知道在这里break?
[回复]
老熊 回复:
10月 18th, 2010 at 2:07 下午
@cc, kokiasg其作用类似于检查一些数据库里面的基础数据,以保证数据库能够正常。所以知道了这一点,在这个地方break,就是阻止进一步的检查。
[回复]
“break kokiasg
kokiasg是什么意思?怎么知道在这里break?”
–这个才是关键了,但是一般人肯定不知道在哪里设置断点。
[回复]
老熊 回复:
1月 7th, 2011 at 2:24 下午
@甲骨虫虫, 这个来自Oracle Support internal的东西,所以这玩意儿只能照着做。
[回复]
(gdb) break kokiasg
Breakpoint 1 at 0xa3d404d
(gdb) continue
Continuing
这个啥意思哈,如何操作!
[回复]
[…] 老熊在“UPDATE GLOBAL_NAME为空之后的恢复”这篇文章里用的恢复方法中用到了gdb和kokiasg的特性,这种方法虽然很简洁,但是过于internal。 […]