[ https://issues.apache.org/jira/browse/HBASE-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Busbey resolved HBASE-21551.
---------------------------------
Resolution: Fixed
Release Note:
<!-- markdown -->
### Summary
HBase clusters will experience Region Server failures due to out of memory errors due to a leak given any of the following:
* User initiates Scan operations set to use the STREAM reading type
* User initiates Scan operations set to use the default reading type that read more than 4 * the block size of column families involved in the scan (e.g. by default 4*64KiB)
* Compactions run
### Root cause
When there are long running scans the Region Server process attempts to optimize access by using a different API geared towards sequential access. Due to an error in HBASE-20704 for HBase 2.0+ the Region Server fails to release related resources when those scans finish. That same optimization path is always used for the HBase internal file compaction process.
### Workaround
Impact for this error can be minimized by setting the config value “hbase.storescanner.pread.max.bytes” to MAX_INT to avoid the optimization for default user scans. Clients should also be checked to ensure they do not pass the STREAM read type to the Scan API. This will have a severe impact on performance for long scans.
Compactions always use this sequential optimized reading mechanism so downstream users will need to periodically restart Region Server roles after compactions have happened.
Post by Zheng Hu (JIRA)Memory leak when use scan with STREAM at server side
----------------------------------------------------
Key: HBASE-21551
URL: https://issues.apache.org/jira/browse/HBASE-21551
Project: HBase
Issue Type: Bug
Components: regionserver
Reporter: Zheng Hu
Assignee: Zheng Hu
Priority: Blocker
Fix For: 3.0.0, 2.2.0, 2.1.2, 2.0.4
Attachments: HBASE-21551.v1.patch, HBASE-21551.v2.patch, HBASE-21551.v3.patch, heap-dump.jpg
{code}
RegionScannerImpl#initializeScanners
|---> HStore#getScanner
|----------> StoreScanner()
|-------> StoreFileScanner#getScannersForStoreFiles
|------> HStoreFile#getStreamScanner #1
{code}
In #1, we put the StoreFileReader into a concurrent hash map streamReaders, but not remove the StreamReader from streamReaders until closing the store file.
So if we scan with stream with so many times, the streamReaders hash map will be exploded. we can see the heap dump in the attached heap-dump.jpg.
I found this bug, because when i benchmark the scan performance by using YCSB in a cluster (heap size of RS is 50g), the Rs was easy to occur a long time full gc ( ~ 110 sec)....
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)