关于log file sync,这对于每一个oracle dba甚至业务开发人员都应该非常熟悉的一个前台等待事件,关于它的诊断在mos中有非常多的文档可供参考,做了8年的oracle dba居然没有对该等待事件做个总结,实在惭愧,本文就跟它做个了断。。。
log file sync是一个非常重要的性能指标,而且由于oracle的一些机制,如cache fusion、fast commit cleanout等机制,它也是造成很多其他等待事件延迟高的元凶。比如gc等待、buffer busy wait。
What is a ‘log file sync’ wait?
When a user session commits, all redo records generated by that session’s transaction need to be flushed from memory to the redo logfile to insure changes to the database made by that transaction become permanent.
At the time of commit, the user session will post LGWR to write the log buffer (containing the current unwritten redo, including this session’s redo records) to the redo log file. Once LGWR knows that its write requests have completed, it will post the user session to notify it that this has completed. The user session waits on ‘log file sync’ while waiting for LGWR to post it back to confirm all redo it generated have made it safely onto disk.
The time between the user session posting the LGWR and the LGWR posting the user after the write has completed is the wait time for ‘log file sync’ that the user session will show.
一个log file sync的生命周期
当一个事务提交或者回滚时,前台进程将进入等待log file sync(除非设置commit_wait=nowait),直到完成整个提交流程,大致过程如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
1.Foreground process posts LGWR for commit ("log file sync" starts). If there are concurrent commit requests queued, LGWR will batch all outstanding commit requests together resulting in a continuous strand of redo. 2.LGWR waits for CPU 3.LGWR starts redo write ("redo write time" starts) 4.For RAC, LGWR broadcasts the current write to other instances 5.After preprocessing, if there is a SYNC standby, LGWR starts the remote write (“SYNC remote write” starts) 6.LGWR issues local write ("log file parallel write") 7.If there is a SYNC standby, LGWR waits for the remote write to complete 8.After checking the I/O status, LGWR ends "redo write time / SYNC remote write" 9.For RAC, LGWR waits for the broadcast ack 10.LGWR updates the on-disk SCN 11.LGWR posts the foregrounds 12.Foregrounds wait for CPU 13.Foregrounds ends "log file sync" |
俗话说,知己知彼才能百战百胜,只有清晰的理解了整个log file sync流程才能对其进行准确的诊断。
log file sync的诊断点
1) LGWR进程优先级
当事务commit/rollback时,LGWR需要获得CPU资源才能进行redo写,当CPU资源比较紧张时,尤其是cpu运行队列很长时,很可能导致LGWR获取不到CPU资源。
如果判断是该问题,解决方案有两种:
- LGWR绑核
- 通过参数“_high_priority_processes”调整LGWR进程优先级
1 2 3 4 5 6 7 8 9 10 11 |
SQL> @sp high_priority_processes -- show parameter by sp -- show hidden parameter by sp old 3: where x.indx=y.indx and ksppinm like '_%&p%' new 3: where x.indx=y.indx and ksppinm like '_%high_priority_processes%' NAME VALUE DESC ---------------------------------------- ---------- ------------------------------------------------------------------------------------------ _high_priority_processes LMS*|VKTM High Priority Process Name Mask |
2) I/O性能差或者链路抖动导致log file parallel write延迟高
这应该是大家都知道的一个性能关注点,对于log file parallel write的延迟,可以通过awr、lgwr trace或者大量的内部视图去诊断。
- DBA_HIST_SYSTEM_EVENT可以查看每个snapshot的log file parallel write的平均延迟
- DBA_HIST_EVENT_HISTOGRAM可以查看每个snapshot的log file parallel write的性能直方图,用于判断log file parallel write性能是否稳定
- V$EVENTMETRIC可以查看近1分钟内的log file parallel write平均延迟情况,我处理实时问题的时候比较喜欢用它
- V$IOFUNCMETRIC结合V$IOFUNCMETRIC_HISTORY可以观察到每分钟的LGWR读写平均延迟,可以发现I/O性能抖动情况。
- 还可以通过lgwr trace去查看是否有超时的告警,由参数”_long_log_write_warning_threshold“控制,默认500ms,超过500ms的将会在lgwr trace中输出
对于11.2.0.4及其以后版本,oracle还增加了一项统计“Redo Synch Time Overhead”
用于判断log file parallel write对log file sync影响程度
The statistic, ‘redo synch time overhead’, included in 11.2.0.4 and 12c, records the difference between ideal and actual, ‘log file sync’ times. If the difference is small, then large ‘log file sync’ waits may be attributed to outlier ‘log file parallel write’ times.
如果发现存在log file parallel write延迟过高或者不稳定的情况,需要结合awr和osw的iostat、os message、demsg去诊断I/O问题(磁盘、HBA、光纤/ib交换机、多路径、I/O流量、IOPS等等),此外还需要关注redo切换频率,是否redo太小。
3) rac之间broadcast on commit同步scn延迟高
对于rac,每次提交还需要将SCN同步到其他实例,此时log file sync的流程如下:
1 2 3 4 5 |
1. A user sessions issues a commit or rollback and starts waiting on log file sync. 2. LGWR gather redo information to be written to redo log files, issues IO and queues BOC to an LMS process and posts LMS process. 3. LGWR waits for redo buffers to be flushed to disk and SCN to be ACK'd 4. Completion of IO and receiving ACK from LMS signal write complete. LGWR then posts foreground process to continue. 5. Foreground process wakes up and log file sync wait ends. |
可以通过awr和内部统计视图查看BOC同步SCN的时间redo write broadcast ack time。redo write broadcast ack time/redo writes 可以得到平均每次boc scn ack的延迟,该延迟非常依赖Avg message sent queue time on ksxp (us),该统计通常小于1ms。直接可以反映私网的性能。
可以通过开启KST来跟踪BOC的过程。开启之后可以查看各个实例的LGWR、LMS trace去观测BOC过程
1 |
alter system set events 'trace[GCS_BSCN] disk=low |
rac其他相关的KST(可通过oradebug看到):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
RAC Real Application Clusters ((null)) GES Global Enqueue Service ((null)) KSI Kernel Service Instance locking (ksi) RAC_ENQ Enqueue Operations ((null)) DD GES Deadlock Detection ((null)) RAC_BCAST Enqueue Broadcast Operations ((null)) RAC_FRZ DLM-Client Freeze/Unfreeze (kjfz) KJOE DLM Omni Enqueue service (kjoe) GCS Global Cache Service (kjb) GCS_BSCN Broadcast SCN (kjb, kcrfw) GCS_READMOSTLY GCS Read-mostly (kjb) GCS_READER_BYPASS GCS Reader Bypass (kjb) GCS_DELTAPUSH GCS Delta Push (kjb) GSIPC Global Enqueue/Cache Service IPC ((null)) RAC_RCFG Reconfiguration ((null)) RAC_DRM Dynamic Remastering ((null)) RAC_MRDOM Multiple Recovery Domains ((null)) CGS Cluster Group Services (kjxg) CGSIMR Instance Membership Recovery (kjxgr) RAC_WLM Work Load Management (wlm) RAC_MLMDS RAC Multiple LMS (kjm) RAC_KA Kernel Accelerator (kjk) RAC_LT RAC Latch Usage ((null)) |
如果判断是BOC ack scn导致的问题,那么需要通过osw里面的netstat和检查交换机去分析是否是os参数不当或者交换机导致的私网性能问题。另外还需要关注的就是LMS优先级,不过在11.2.0.4及其以上版本,LMS默认就加入到”_high_priority_processes“参数中了。
4) 存在sync同步的standby,由于网络延迟导致
When SYNC is enabled we introduce a remote write (RFS write to a standby redo log) in addition to the normal local write for commit processing (unless using Fast Sync with Oracle Database 12c). This remote write, depending on network latency and remote I/O bandwidth, can make commit processing increase in time. Since commit processing is taking longer that means that we will see more sessions waiting on LGWR to finish its work and begin work on their commit request
可以通过一些等待事件去判断是否由于sync传输redo的性能问题导致的log file sync延迟高
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
WAIT EVENTS RELEVANT TO DATA GUARD 11.2 ARCH wait on ATTACH:Monitors the amount of time spent by all archiver processes to spawn a new RFS connection. ARCH wait on SENDREQ:Monitors the amount of time spent by all archiver processes to write archive logs to the local disk as well as write them remotely. ARCH wait on DETACH:Monitors the amount of time spent by all archiver processes to delete an RFS connection. LNS wait on ATTACH:Monitors the amount of time spent by all LNS processes to spawn a new RFS connection. LNS wait on SENDREQ:Monitors the amount of time spent by all LNS processes to write the received redo to disk as well as open and close the remote archived redo logs. LNS wait on DETACH:Monitors the amount of time spent by all LNS processes to delete an RFS connection. LGWR wait on LNS:Monitors the amount of time spent by the log writer (LGWR) process waiting to receive messages from LNS processes. LNS wait on LGWR:Monitors the amount of time spent by LNS processes waiting to receive messages from the log writer (LGWR) process. LGWR-LNS wait on channel:Monitors the amount of time spent by the log writer (LGWR) process or the LNS processes waiting to receive messages. WAIT EVENTS RELEVANT TO DATA GUARD FROM ORACLE DATABASE 12C ONWARD ARCH Remote Write:The time (in centi-seconds) that ARCn background processes spend blocked waiting for network write operations to complete. ASYNC Remote Write:The time (in centi-seconds) for asynchronous streaming RFSWRITE operations.This includes stall reaps and streaming network submission time. This time is accumulated by TTnn (Redo Transport Slave) background processes.. Redo Transport Attach:The time spent (in centi-seconds) doing Connect, Logon, and RFSATTACH for any network process. Redo Transport Close:The time spent (in centi-seconds) by ARCn, NSSn, and TTnn processes doing RFSCLOSE and RFSRGSTR operations. Redo Transport Detach:The time spent (in centi-seconds) doing RFSDETACH and Disconnect for any network process.. Redo Transport Open:The time spent (in centi-seconds) by ARCn, NSSn, and TTnn background processes doing RFSCREAT and RFSANNCE operations. Redo Transport Ping:The time spent (in centi-seconds) by ARCn background processes doing RFSPING operations. Redo Transport Slave Shutdown:The time spent (in centi-seconds) by LGWR doing NSSn and TTnn process shutdown and termination. Redo Transport Slave Startup:The time spent (in centi-seconds) by LGWR doing NSSn and TTnn process startup and initialization Redo Writer Remote Sync Complete:The time spent (in centi-seconds) by LGWR reaping completed network writes to remote destinations. Redo Writer Remote Sync Notify:The time spent (in centi-seconds) by LGWR issuing network writes to remote destinations. Remote SYNC Ping:The time spent (in centi-seconds) by the LGWR and NSSn background processes doing synchronous PING operations. SYNC Remote Write:The time spent by LGWR doing SYNC RFSWRITE operations. |
可以通过设置主备库的alter system set 16410 trace name context forever, level 16去观察发送/接收redo的过程。
如果发现是由于sync模式导致的log file sync过高,那么最好的办法就是sync改为async,sync对网络延迟和备库I/O要求都比较苛刻,也比较容易出现问题,今天才分析了一个由于大量TCP重传导致Redo Transport MISC 延迟过高加剧了log file sync延迟的问题。可以通过tcpdump去发现。
如果非要用sync的话,除了检查主备osw中netstat/iostat,判断os参数是否合理、网卡是否异常;I/O性能是否异常之外。建议参考sync模式的最佳实践来进行配置https://www.oracle.com/technetwork/database/availability/sync-2437177.pdf,如根据每秒redo生成量,合理的创建redo组大小;根据网络带宽和延迟,合理在TNS或者SQLNET层面设置合理的tcp socket buffer等等,详细设置在文档中,这里就不概述了。
5) 过于频繁的commit/rollback
mos上有一个判断是否频繁的标准:
In the AWR or Statspack report, if the average user calls per commit/rollback calculated as “user calls/(user commits+user rollbacks)” is less than 30, then commits are happening too frequently
6) Adaptive Log File Sync
There are 2 methods by which LGWR and foreground processes can communicate in order to acknowledge that a commit has completed:
1.Post/wait –
traditional method available in previous Oracle releases
LGWR explicitly posts all processes waiting for the commit to complete.
The advantage of the post/wait method is that sessions should find out almost immediately when the redo has been flushed to disk.
2.Polling –
Foreground processes sleep and poll to see if the commit is complete.
The advantage of this new method is to free LGWR from having to inform many processes waiting on commit to complete thereby freeing high CPU usage by the LGWR.
Initially the LGWR uses post/wait and according to an internal algorithm evaluates whether polling is better. Under high system load polling may perform better because the post/wait implementation typically does not scale well. If the system load is low, then post/wait performs well and provides better response times than polling. Oracle relies on internal statistics to determine which method should be used. Because switching between post/wait and polling incurs an overhead, safe guards are in place in order to ensure that switches do not occur too frequently
Adaptive Log File sync was introduced in 11.2. The parameter controlling this feature, _use_adaptive_log_file_sync, is set to false by default in 11.2.0.1 and 11.2.0.2.
In 11.2.0.3 the default is now true. When enabled, Oracle can switch between the 2 methods。
该参数可以自适应的调整前台进程收到lgwr反馈消息的方式,可以通过lgwr trace或者awr统计中的redo synch poll writes/redo synch polls来判断是否发生过切换,通常Polling模式会出现长时间的log file sync,所以该参数一般都建议关掉。
7) LGWR被阻塞
lgwr被阻塞,那么肯定就刷不了redo了,这种情况得根据实际情况结合ash、v$session、hanganalyze、systatedump等方式去分析。大概率是CF队列等待。
over,关于log file sync的总结就那么多了,有不足之处欢迎交流。