string table的本质是hashtable,而hashtable的性能和桶的个数密切相关,对于string table进行调优其实就是要对hashtable的桶的个数进行调节。
/** * 演示串池大小对性能的影响 * -XX:+PrintStringTableStatistics */ public class Demo1_24 { public static void main(String[] args) throws IOException { try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("linux.words"), "utf-8"))) { String line = null; long start = System.nanoTime(); while (true) { line = reader.readLine(); if (line == null) { break; } line.intern(); } System.out.println("cost:" + (System.nanoTime() - start) / 1000000); } } }
配置并运行上列代码,打印信息如下。
cost:439 SymbolTable statistics: Number of buckets : 20011 = 160088 bytes, avg 8.000 Number of entries : 13697 = 328728 bytes, avg 24.000 Number of literals : 13697 = 609024 bytes, avg 44.464 Total footprint : = 1097840 bytes Average bucket size : 0.684 Variance of bucket size : 0.684 Std. dev. of bucket size: 0.827 Maximum bucket size : 6 StringTable statistics: Number of buckets : 60013 = 480104 bytes, avg 8.000 Number of entries : 481494 = 11555856 bytes, avg 24.000 Number of literals : 481494 = 29750344 bytes, avg 61.788 Total footprint : = 41786304 bytes Average bucket size : 8.023 Variance of bucket size : 8.084 Std. dev. of bucket size: 2.843 Maximum bucket size : 23
配置参数-XX:StringTableSize=200000
,打印信息如下。
cost:393 SymbolTable statistics: Number of buckets : 20011 = 160088 bytes, avg 8.000 Number of entries : 13697 = 328728 bytes, avg 24.000 Number of literals : 13697 = 609024 bytes, avg 44.464 Total footprint : = 1097840 bytes Average bucket size : 0.684 Variance of bucket size : 0.684 Std. dev. of bucket size: 0.827 Maximum bucket size : 6 StringTable statistics: Number of buckets : 200000 = 1600000 bytes, avg 8.000 Number of entries : 481494 = 11555856 bytes, avg 24.000 Number of literals : 481494 = 29750344 bytes, avg 61.788 Total footprint : = 42906200 bytes Average bucket size : 2.407 Variance of bucket size : 2.420 Std. dev. of bucket size: 1.556 Maximum bucket size : 12
配置参数-XX:StringTableSize=1009
,打印信息如下。
cost:4870 SymbolTable statistics: Number of buckets : 20011 = 160088 bytes, avg 8.000 Number of entries : 16327 = 391848 bytes, avg 24.000 Number of literals : 16327 = 698456 bytes, avg 42.779 Total footprint : = 1250392 bytes Average bucket size : 0.816 Variance of bucket size : 0.811 Std. dev. of bucket size: 0.901 Maximum bucket size : 6 StringTable statistics: Number of buckets : 1009 = 8072 bytes, avg 8.000 Number of entries : 482764 = 11586336 bytes, avg 24.000 Number of literals : 482764 = 29845512 bytes, avg 61.822 Total footprint : = 41439920 bytes Average bucket size : 478.458 Variance of bucket size : 432.042 Std. dev. of bucket size: 20.786 Maximum bucket size : 547
可以发现,当桶的数量更多时,哈希冲突的可能性会减少,这样入池的时间会更少。
为什么要使用string table来存储字符串呢?因为这样可以节省空间,避免重复创建字符串对象。网络上流传twitter中存储用户信息包含地址项,如果不使用string table存储,需要约30G内存,但是这些地址可能包含大量重复地址,可能很多个用户都是来自于北京市中关村,于是twitter将地址信息入池,由string table创建存储,将这个内存空间降低至数十M。
通过下面实例来感受这一过程。
/** * 演示 intern 减少内存占用 * -XX:StringTableSize=200000 -XX:+PrintStringTableStatistics * -Xsx500m -Xmx500m -XX:+PrintStringTableStatistics -XX:StringTableSize=200000 */ public class Demo1_25 { public static void main(String[] args) throws IOException { List<String> address = new ArrayList<>(); System.in.read(); for (int i = 0; i < 10; i++) { try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("linux.words"), "utf-8"))) { String line = null; long start = System.nanoTime(); while (true) { line = reader.readLine(); if(line == null) { break; } address.add(line); } System.out.println("cost:" +(System.nanoTime()-start)/1000000); } } System.in.read(); } }
在键盘键入前,没有读进行数据读取操作,键入后则进行了数据读取并存入了address中。在这两个时间节点采用jvisualvm中sampler进行取样。结果如下。
修改代码address.add()
address.add(line.intern());