Known Issues in GemFire 8.1.0

Last updated: April 10, 2015

Id Bugnote title Bugnote description Workaround
#51967 gfsh hangs on Windows when starting locators or servers Starting locators or servers through gfsh on Windows causes gfsh to hang and become unresponsive. The only way to gain control again from the console is killing manually the process or connecting to it from a gfsh console in another window. Setting gemfire.OSProcess.ENABLE_OUTPUT_REDIRECTION=true works around this issue. For example, on Windows, use the following command to start up a locator: gfsh start locator --name=locator --dir=. --J=-Dgemfire.OSProcess.ENABLE_OUTPUT_REDIRECTION=true
#51849 Calling Region.destroyRegion in one member while intializating the other results in hang In rare cases, if a member is in the middle of initialization, calling destroyRegion from another member (to destroy the region on all members) can result in a hang with the message "GII failed from all sources, but members are still online. Retrying the GII" in the logs. Wait until all members are initialized before destroying a region.
#51815 Avoid hyphen in statistic archive file name Use of hyphen character in statistic archive file name exposes a bug in which `archive-file-size-limit` and `archive-disk-space-limit` have no effect. Avoid the use of hyphen characters in statistic archive file name to ensure that statistic archive files are limited in size and old files are cleaned up as expected.
#51799 Gemfire requires log4j2, using log4j2 in an OSGI container logs an error "Unable to create context org.apache.logging.log4j.core.osgi.BundleContextSelector" log4j2 has a bug that causes it log this error because the Activator for log4j2-core sets a system property to configure the ContextSelector to be org.apache.logging.log4j.core.osgi.BundleContextSelector. However, log4j2-api does not import that package, so it is not able to load the class. See https://issues.apache.org/jira/browse/LOG4J2-920 Add org.apache.logging.log4j.core.osgi to the Import-Package header in the manifest for log4j2-api.jar.
#51798 Starting gemfire in felix 4.X and above results in NoClassDefFoundError: javax/transaction/Synchronization This is due to a bad interaction between the JDK and felix. The JDK only includes some of the classes from the javax.transaction package. In non-osgi environments we work around the issue by including the JTA classes as part of gemfire. However, felix does not support having classes in the same package come from multiple jar files. See this email thread for more information: http://mail-archives.apache.org/mod_mbox/felix-users/201211.mbox/%3CCAPr=90M+5vYjPqAvyTU+gYHr64y_FosBYELeUYcU_rFEJF3Cxw@mail.gmail.com%3E The easiest fix is to add the jta classes to the bootclassloader: -Xbootclasspath/a:/path/to/jta-1.1.jar Alternatively the issue can be worked around by setting org.osgi.framework.system.packages in the felix configuration file as described here - http://eclipse.org/jetty/documentation/current/framework-jetty-osgi.html#d0e20819
#51797 OQL query fails on partitioned region when it contains reserved keywords in order by clause If an OQL query with reserved keywords like "date" in order by clause is executed on a partitioned region it fails with QueryInvokationTargetException. No workaround.
#51728 Using a PartitionedRegion as a log statement parameter hangs if all buckets cannot be created Providing a GemFire Region as a parameter to a Log4J2 log statement when using the default ParameterizedMessage results in a ParameterizedMessage.recursiveDeepToString being called on the Region because it's a Map. If the Region type is Partitioned then it's essentially a distributed hash map. Attempting to perform a recursiveDeepToString on a Partitioned region can then result in a hang as the system attempts create every bucket for that Partitioned Region before it's been initialized. Use region.toString() as a log statement parameter instead of the region.
#51708 Use of monitorInterval in log4j2.xml negatively impacts performance Use of monitorInterval in log4j2.xml negatively impacts performance. The implementation of FileConfigurationMonitor in Log4J 2.1 involves making each thread that logs a statement also responsible for checking system clock against the interval time and then checking the timestamp of the log4j2.xml file. See org.apache.logging.log4j.core.config.FileConfigurationMonitor.java Avoid using monitorInterval in log4j2.xml especially for production systems or when doing benchmarking. Only use monitorInterval when debugging or experimenting.
#51645 SSL cipher suites do not work properly for Rest Management APIS If HTTP service is SSL-enabled & configured with user preferred cipher suites, then the REST Management APIs (remote cluster management via HTTPS) do not work properly. The other HTTP services like Pulse and REST Developer API works as expected. User should use the default cipher suites picked up by JVM.
#51621 Continuous execution of start stop gatewaysender commands gives wrong results Federation takes time propagate value of bean.isRunning() to Manager. Federation time is at least two seconds.Hence the error message which checks values at Manager node. Such scenarios can come in scripts. To avoid this sleep() commmand was introduced. Even in case of manual typing if you re execute command it runs successfully. Use sleep command if needed in a script.
#51590 Region creation may take a long time When using cluster configuration, region creation can take a long time if the system does not have direct internet connectivity. This is due to the XML processing trying to resolve the DTD from the web without checking locally first. This issue does not affect GemFire 8.1. Set www.gemstone.com as an alias for localhost in the /etc/hosts file or set http.proxyHost and http.proxyPort if a proxy server exists which can reach the internet.
#51586 Queries invoking size method on the entire query result may fail with ClassCastException A query which invokes size() method on entire query results like: (select * from /region where field1 ='str1' or field2 ='str2').size() may fail with ClassCastException. Use count(*) instead of invoking size() on entire query. select count(*) from /region where field1 ='str1' or field2 ='str2'
#51412 JSONFormatter - Automatically round the bigdecimal to float and bigint to int JSONFormatter - Automatically round the values of type bigdecimal to float due to the limitation of Jackson parser used. Applications should specify BigDecimal values as a String (enclosed with double quote) in JSON doc". These values will be parsed/stored as a strings inside gemfire and Applications/users has to take care for getting meaningful BigDeciaml values out of this string representation.
#51321 PDX ReflectionBasedAutoSerializer does not support java.util.Data fields that contain a subclass of Date If the PDX auto serializer is used to serialize a class with a field of type "java.util.Date" and that field contains a subclass of Date then this exception will be thrown during serialization: java.lang.IllegalArgumentException: writeDate only accepts instances of Date. Subclasses are not supported. Use writeObject for subclasses of Date. If possible change the type of the field to be the subclass instead of java.util.Date. Otherwise you can create a subclass of ReflectionBasedAutoSerializer and override this method: public FieldType getFieldType(Field f, Class<?> clazz); In your overridden implementation test "f" to see if it is the field typed with "java.util.Date" (you can also test "clazz" to see if it is java.util.Date.class) that will contain a subclass. If it is return FieldType.OBJECT. Otherwise return super.getFieldType(f, clazz).
#51103 SerializationException: Could not create an instance of com.gemstone.gemfire.internal.cache.tier.sockets.HAEventWrapper Product logs may show an exception string that reads, "SerializationException: Could not create an instance of com.gemstone.gemfire.internal.cache.tier.sockets.HAEventWrapper" The exception is harmless and can be ignored.
#51078 Backup on multi-host windows platforms fails Due to a race condition while creating directories, a multi-host backup on windows platform may fail with "IOException: Could not create directory" Use a directory that is local to all host machines in the system. see (http://gemfire.docs.pivotal.io/latest/userguide/index.html?q=/latest/userguide/managing/disk_storage/backup_restore_disk_store.html)
#51034 Due to host mapping issues, destroy region command fails validation due to empty response Depending upon configuration of /etc/hosts user may get this issue. This issue looks very similar to #46580 & #47645. These issues come if there are no or incorrect host-IP mapping in the /etc/hosts file. JMX federation was failing due to #47645. It was resolved by removing host name from unique identifier. Similar needs to be worked out here while determining members hosting a particular region.This issue will most likely go away with a proper host-IP mapping. Specify correct host-IP mappings in /etc/hosts.
#51024 Spurious warning: Message deserialization of <MessageType> ... did not read <XXX> bytes You may see the following warning on the gemfire log: Message deserialization of <MessageType> ... did not read <XXX> bytes Some messages do not read all their data when they detect some other condition that causes them to stop early. This warning can be ignored.
#50950 In star pattern multi-site WAN configuration (all sites connected to each other), gateway events do not reach all receivers when a sender is stopped even though an alternate path between the sender and receiver exists. For example, in a star pattern deployment with 3 sites named 'ln', 'ny', 'tk', if you perform a put in 'ln', the event gets sent from 'ln' to both 'ny' and 'tk'. However, if a gateway sender in site 'ln' is stopped while sending updates to the 'ny' site, when you then do a put in 'ln', the event only gets sent to 'tk'. The event is not forwarded to 'ny' from 'tk' because 'tk' thinks has already been sent to 'ny' by 'ln'. None
#50773 Setting socket-lease-time too low can result it members being forced out of the distributed system. If the socket-lease-time gemfire property is set to a small number then it may cause unexpected connectivity problems. For example it may cause ForcedDisconnectExceptions. Set socket-lease-time to a larger value. A safe minimum has not yet been determined but problems have been seen when it is set to a value lower than 1000.
#50513 ClassCastException (Class cannot be cast to VersionRespons) occurs when Locator is configured with SSL by client (e.g. Gfsh) attempts to connect without SSL. When a Locator is started in Gfsh, configured with SSL, perhaps like so... gfsh>start locator --name=LocatorWithSSL --port=12480 --log-level=config --properties-file=./conf/gemfire.properties --security-properties-file=./conf/gemfire-security.properties And a client subsequently attempts to connect without SSL, then the following Exception is thrown from GemFire... [severe 2014/05/13 18:00:30.684 PDT Gfsh Launcher tid=0xb] (msgTID=11 msgSN=106) java.lang.ClassCastException: java.lang.Class cannot be cast to com.gemstone.org.jgroups.stack.tcpserver.VersionResponse java.lang.IllegalStateException: java.lang.ClassCastException: java.lang.Class cannot be cast to com.gemstone.org.jgroups.stack.tcpserver.VersionResponse at com.gemstone.gemfire.management.internal.JmxManagerLocatorRequest.send(JmxManagerLocatorRequest.java:93) at com.gemstone.gemfire.management.internal.cli.commands.ShellCommands.connectToLocator(ShellCommands.java:516) at com.gemstone.gemfire.management.internal.cli.commands.ShellCommands.connect(ShellCommands.java:341) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.gemstone.gemfire.management.internal.cli.util.spring.ReflectionUtils.invokeMethod(ReflectionUtils.java:44) at com.gemstone.gemfire.management.internal.cli.shell.GfshExecutionStrategy.execute(GfshExecutionStrategy.java:104) at org.springframework.shell.core.AbstractShell.executeCommand(AbstractShell.java:223) at com.gemstone.gemfire.management.internal.cli.shell.Gfsh.executeCommand(Gfsh.java:414) at com.gemstone.gemfire.management.internal.cli.shell.Gfsh.promptLoop(Gfsh.java:864) at org.springframework.shell.core.JLineShell.run(JLineShell.java:158) at java.lang.Thread.run(Thread.java:695) Caused by: java.lang.ClassCastException: java.lang.Class cannot be cast to com.gemstone.org.jgroups.stack.tcpserver.VersionResponse at com.gemstone.org.jgroups.stack.tcpserver.TcpClient.requestToServer(TcpClient.java:90) at com.gemstone.org.jgroups.stack.tcpserver.TcpClient.requestToServer(TcpClient.java:73) at com.gemstone.gemfire.management.internal.JmxManagerLocatorRequest.send(JmxManagerLocatorRequest.java:84) ... 13 more Client must connect to the Locator using SSL when SSL was configured for the Locator when started.
#50412 Starting GatewaySenders and/or GatewayReceivers before region creation can cause several problems. Starting GatewaySenders and/or GatewayReceivers before region creation can cause several problems such as data loss on the receiver side and hangs in the HA scenario on the sender side (in case of persistence). Start GatewaySender, GatewayReceiver after region is created.
#50322 Unexpected EOF when gunzipping a gfs.gz file VSD currently requires that statistic archives be uncompressed to be loaded into it. But you may see "unexpected end of file" from the gunzip command when you try to uncompress a gfs.gz archive. This happens if the gfs.gz file was not cleanly shutdown which happens if the server writing it is still running or if the server was killed or crashed. Use "gunzip -c stats.gfs.gz >stats.gfs" to uncompress. You will still see a message about an unexpected EOF but it can be ignored and you can now load "stats.gfs" into VSD.
#50065 Data inconsistency in client with concurrent ops (destroy or invalidate + create) and concurrencyChecksEnabled An operation is in progress in a PR server node that is applied to the cache, but before the operation can be distributed to clients, the VM's shutdown hook starts to close the cache. This preempts messaging and keeps the event from reaching clients. The server then restarts and recovers from disk, so it has the entry but any clients having subscription queues in other server nodes do not see the event. It only happened when redundancy=0. Use redundancy=1 to resolve the issue.
#49520 Performance degradation with SSL enabled WAN GatewaySender When using SSL enabled with SerialGatewaySender, the performance degrades to some extent. Either use a cipher which is far less expensive or shift to Parallel WAN which is available from 7.0.
#49409 With ParallelGatewaySender, extra directories created for logical diskstore name With ParallelGatewaySender, extra directories get created with logical name given to the diskstore. e.g. If "disk" is the logical diskstore name, an extra and empty directory with name "disk" is created. This does not cause any serious issues, just user confusion. None.
#48506 For non redundant partitioned region, single-hop fails to work once the failed server is restarted The reason this is failing is that the version the client has is the same as the version the new server has (4 in this case). The server receiving the client request keeps telling the client to refresh its metadata by sending the current metadata version for whatever bucket is remote. Unfortunately, the version it sends is exactly the same as the version the client already has, so the client doesn't schedule a metadata refresh task. None
#48141 AsyncEventQueue does not process events with Local regions When AsyncEventQueue is attached to a local region, the events on the region are filtered internally and not processed by the AsyncEventQueue. Use local region with persistence.
#48123 Deploying a new function to Gemfire with Declarable interface and no properties fails. When deploying a new Function to GemFire with a declarable interface and no properties (propertiesList has 0 elements), the deployment fails. Remove Declarable interface on new functions.
#47790 Event loss in remote site in case GatewayReceiver started before user region is created On remote site, if GatewayReceiver is started before creating the user region, it may cause loss of events on remote site. Create user regions on remote site before starting the GatewayReceivers.
#47733 GatewayReceiver started before creating user region can cause RegionDestroyedException On remote WAN site, if GatewayReceiver is started before creating user region, it can cause RegionDestroyedExceptions. On remote WAN site, create user regions before starting the GatewayReceiver.
#47676 Join queries take very long time to execute Queries using joins among multiple regions may take longer time to execute. Use indexes on fields used in the join.
#47390 cacheClientProxyStats:messageQueueSize does not take into account 'most recent events dispatched' to client. Events that have already been dispatched are removed from the queue during subsequent dispatch. So at any given point in time, there will be some events in the queue which are dispatched to the client and acks received for them from the client but are not yet removed from the queue. These are removed during the next dispatch of a subsequent event. Customers may use clientSubscriptionStats: (eventsQueued-eventsDispatched-eventsRemovedByQRM-eventsExpired-eventsConflated) to find out the queue size. eventsConflated will be zero if conflation is off which is the default. eventsRemovedByQRM and eventsExpired will be zero if the server has been primary for this client throughout.
#46878 ^Z will kill gfsh and any servers started from that gfsh If you type ^Z from gfsh, it will kill your gfsh process and any locator/server processes that you started from that gfsh process. Note that this will also happen if you are running a shell script that is executing gfsh when you type ^Z. Use ^C instead of ^Z to interrupt a long running gfsh command. This will cause gfsh to quit waiting but leave the child processes running.
#46230 JLine DLL issue for multiple instances of gfsh started simultaneously. JLine uses DLLs on Windows to interact with the operating system to read special keys (such as the arrow keys) which are otherwise inaccessible when using the System.in stream. There is a rare possibility of two instances of gfsh trying to load the same DLL at the same time, which causes the second instance to fail. Restart the second instance of gfsh.
#46112 Moving gfsh to the background is not supported. On Unix systems, jobs can be moved to background (using Ctrl-Z) and then later moved back to the foreground (using fg). This behavior is not supported for gfsh. When you type Ctrl-Z, gfsh exits with a "FATAL Exit". Do not move gfsh jobs to the background.
#45964 Hang doing distributed region destroy during persistent recovery If a running member initiates a distributed destroy of a persistent region using Region.destroyRegion at the same time another member is trying to recover the region from disk, there is a slight chance the distributed destroy and the member recovery will hang. Wait until all members are running before doing a distributed destroy. If this hang is encountered, kill the member that is trying to recover from disk.
#44643 With SerialGatewaySenderQueue, extra directories created for logical diskstore name With SerialGatewaySenderQueue, extra directories get created with logical name given to the diskstore. e.g. If "disk" is the logical diskstore name, an extra and empty directory with name "disk" is created. This does not cause any serious issues, just user confusion. None
#44606 Registration of instantiators can cause Gateway deadlock Gateways experience deadlock when trying to register instantiators. Register the instantiators in the hubs prior to creating the cache using the serialization-registration cache xml element. This prevents the InternalInstantiator .sendRegistrationMessageToServers call
#44558 Gateway.stop() does not cleanup/destroy the region for the Gateway Event Queue Manually stopping a gateway using the API doesn't close the region backing the queue. This will cause unnecessary event replication to the JVM containing the stopped gateway. The region is internal but it can be retrieved and closed manually. The region is named: gatewayHubId + "_" + gatewayId + "_EVENT_QUEUE" String gatewayRegionName = gatewayHubId + "_" + gatewayId + "_EVENT_QUEUE"; Region region = cache.getRegion(gatewayRegionName); region.close(); The region should be just closed and not destroyed so any persistent data is not deleted.
#44411 Querying on an enum field always returns an empty result set Querying on enum fields returns an empty result set even when there are qualifying rows. The only workaround available for this issue is to use a bind parameter for the enum field in the query. For example: This query fails: select distinct * from /QueryRegion0 where aDay = Day.Wednesday The query succeeds when the query is rewritten as follows: select distinct * from /QueryRegion0 where aDay = $1 and Day.Wednesday is passed as an execution parameter.
#44410 The load-conditioning-interval property does not work as expected when connecting to explicit endpoints When the load-conditioning-interval property is used with explicit servers instead of with locators, connections are still recycled after 5 minutes. The property works as expected when you are using locators to obtain connections for server communication. Use locators to obtain connections for server communication.
#44404 Partitioned region single hop may fail to direct load balance requests to a newly joined server when optimize-for-write is set to false This problem is caused by stale metadata on the client. The problem occurs when the client is only performing read operations, and a Function has optimize-for-write set to false. Any write operations into the region will fix the problem. Perform a write operation into the region to fix the problem.
#44399 Changing the distributed-system-id can cause PDX failures If the distributed-system-id is changed and a previously used one is re-used, then PdxType conflicts can occur. Do not change the distributed-system-id after it has been set.
#44229 Destroy operation on a region causes offline member to become unusable When some members are offline, a destroy (or local destroy) operation on a persistent region causes the offline member to be unable to start. Start all offline members before destroying a persistent region.
#43904 WAN Gateways started before regions are created can cause updates to be lost If gateways are restarted and connected to remote sites before the local regions are created, then any events received by those gateways will cause exceptions and be dropped. In the case where gateways are defined in the same JVMs as the regions using xml, proper startup order is maintained and this will not happen. In the case where gateways are created and started in JVMs separate from those where regions are created, startup ordering may not be correct. Make sure that gateways are started after regions are created and initialized. In the case where gateways are created and started in JVMs separate from those where the regions are created, they should be manually started after the regions are created. A RegionMembershipListener can be used to facilitate this.
#43866 Cache plugins may fail if read-serialized is true If read-serialized is set to true on your cache and you have plugin classes (for example CacheListener, CacheWriter, CacheLoader), when those plugins are serialized as a PDX, the plugin fails because GemFire sees the plugin as an instance of PdxInstance. This problem only occurs if your plugins are serialized as PDX because you have implemented PdxSerializable or have a PdxSerializer that serializes the plugin class. Note that classes that implement Function are never passed to a PdxSerializer but can still implement PdxSerializable and then fail just like the other plugins. Do not implement PdxSerializable or change your PdxSerializer to serialize that plugin class. Instead, make your plugin class implement DataSerializable. This prevents the plugin from being serialized by a PdxSerializer.
#43849 Attempts to use a writable-working-dir over NFS may result in hangs involving NIO file locking Attempting to use a writable-working-dir over NFS may result in hangs involving NIO file locking. Licensing uses java.nio.channels.FileChannel.lock to lock the license state and events files that are persisted to writable-working-dir. The call to FileChannel lock may hang in the JVM native layer. The stack dump of the hung thread may look similar to the following: {{{java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileChannelImpl.lock0(Native Method) at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:832) at java.nio.channels.FileChannel.lock(FileChannel.java:860) at com.springsource.vfabric.licensing.events.EventManager.saveEvents(EventManager.java:61) - locked <0xe02f5a80> (a java.lang.Object) at com.springsource.vfabric.licensing.events.EventManager.saveEvent(EventManager.java:45) at com.springsource.vfabric.licensing.events.EventManager.<init>(EventManager.java:37) at com.springsource.vfabric.licensing.client.LicenseManagerEnvironment.<init>(LicenseManagerEnvironment.java:61) at com.springsource.vfabric.licensing.client.LicenseManagerFactory.getLicenseManager(LicenseManagerFactory.java:80) at com.gemstone.gemfire.internal.licensing.VFabricLicenseEngine.getLicenseManager(VFabricLicenseEngine.java:398) at com.gemstone.gemfire.internal.licensing.VFabricLicenseEngine.acquireLicense(VFabricLicenseEngine.java:93) at com.gemstone.gemfire.internal.licensing.CacheLicenseChecker.acquireLicense(CacheLicenseChecker.java:76) at com.gemstone.gemfire.internal.licensing.LicenseChecker.acquireLicense(LicenseChecker.java:251) - locked <0xe02604c8> (a com.gemstone.gemfire.internal.licensing.ServerLicenseChecker) at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.getLicenseChecker(InternalDistributedSystem.java:635) - locked <0xdfd56c28> (a java.util.concurrent.atomic.AtomicReference) at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.initialize(InternalDistributedSystem.java:470) at com.gemstone.gemfire.distributed.internal.InternalDistributedSystem.newInstance(InternalDistributedSystem.java:223) at com.gemstone.gemfire.distributed.DistributedSystem.connect(DistributedSystem.java:932) }}} Specify a directory on a local drive for writable-working-dir instead of a directory that is accessed through NFS. The property writable-working-dir is specified in gemfire.properties.
#43758 Suspended transaction from function execution unusable after primary rebalancing When multiple invocations of a function participate in a single transaction (suspending and resuming transactions for each invocation), a high-availablity event may re-balance the primaries, which make it impossible possible to target the original transactional node for function execution. Use the system property gemfire.DISABLE_MOVE_PRIMARIES_ON_STARTUP to allow function execution to target the same member.
#43750 Gateway toString erroneously indicates that the Gateway is connected The Gateway toString message indicates that the Gateway is connected to the remote site even when it is not connected. For example: [info 2011/07/26 15:46:57.598 EDT <main> tid=0x1] Started Primary Gateway to LN connected to [LN-1=ln_host_1:6622, LN-2=ln_host_2:6622] To determine whether the Gateway failed to connect to the remote site, look for a warning similar to the following: [warning 2011/07/26 15:46:57.527 <main> tid=0x1] Primary Gateway to LN not connected to [LN-1=ln_host_1:6622, LN-2=ln_host_2:6622]: Could not connect. To determine when the Gateway successfully connects to the remote site, look for a message similar to the following: [info 2011/07/26 16:07:36.187 EDT <Gateway Event Processor from NY to LN> tid=0x154] Primary Gateway to LN connected to [LN-1=ln_host_1:6622, LN-2=ln_host_2:6622]: Using com.gemstone.gemfire.cache.client.internal.pooling.PooledConnection@1bb1849: Connection[ln_host_2:6622] after 81 failed connect attempts
#43713 JRockit may crash with an Illegal memory access JRockit may crash with an illegal memory access. The specific version we say this with during testing was: BEA JRockit(R) R27.6.5-32_o-121899-1.6.0_14-20091001-2107-windows-ia32. The call stack looked like this: Thread Stack Trace: at findNext+288()@0xffffffff7ddbc9f4 at findNextToReturn+32()@0xffffffff7ddbca94 at refIterFillFromFrame+248()@0xffffffff7ddbcd2c at trProcessLocksForThread+52()@0xffffffff7ddcb1c0 at get_all_locks+88()@0xffffffff7dcee638 at javaLockConvertLazyToThin+88()@0xffffffff7dcee730 at RJNI_jrockit_vm_Locks_checkLazyLocked+584()@0xffffffff7dcf01d8 In this case following things might work - Turn off the optimizations with -Xnoopt option. This option turns off adaptive optimization. While optimized code generally runs faster than code that hasn’t been optimized, occasionally, the time required to optimize code results in undesirable delays processing. -XnoOpt lets you avoid these delays by turning off optimization. This option is also helpful when you suspect that a JVM or application problem, such as a system crash or poor startup performance, might be related to optimization. You can turn optimization off and retry your application. If it then runs successfully, you can safely assume that the problem lies with code optimization For more information check out the below link Topic: -XnoOpt http://download.oracle.com/docs/cd/E13150_01/jrockit_jvm/jrockit/jrdocs/refman/optionX.html#wp1020479 - Try to upgrade to the latest JRockit version as most of the problems are be fixed just by upgrading. - Last option would be get in touch with the Oracle Weblogic Support team.
#43673 Using query "select * from /exampleRegion.entrySet" fails in a client-server topology and/or in a PartitionedRegion. Using query "select * from /exampleRegion.entrySet" fails in a client-server topology and/or in a PartitionedRegion. The Following exception is thrown: Exception in thread "main" com.gemstone.gemfire.cache.client.ServerOperationException?: com.gemstone.gemfire.SerializationException?: failed serializing object at com.gemstone.gemfire.cache.client.internal.OpExecutorImpl?.handleException(OpExecutorImpl?.java:530) at - Caused by: com.gemstone.gemfire.SerializationException?: failed serializing object at - com.gemstone.gemfire.internal.cache.tier.sockets.BaseCommand?.writeQueryResponseChunk(BaseCommand?.java:750) at - - Caused by: java.io.NotSerializableException?: com.gemstone.gemfire.internal.cache.LocalRegion?$NonTXEntry at java.io.ObjectOutputStream?.writeObject0(ObjectOutputStream?.java:1164) at java.io.ObjectOutputStream?.writeObject(ObjectOutputStream?.java:330) at com.gemstone.gemfire.internal.InternalDataSerializer?.writeSerializableObject(InternalDataSerializer?.java:2032) at Use "select e.key, e.value from /exampleRegion.entrySet e" and construct the entry object in the application that is using Gemfire.
#43607 DynamicRegionFactory with registerInterest on a client may cause dynamic subregions to be lost DynamicRegionFactory with registerInterest on a client may cause dynamic subregions to be lost. If the client loses redundancy registerInterest will destroy any of the dynamic subregions. To avoid this problem set subscription-redundancy to a non-zero value or disable registerInterest on DynamicRegionFactory.
#43545 Cache close on client will wait until all operations in progress have been completed Cache close on client will wait until all operations in progress have been completed. This is because operations like putAll take the timeout value as an input parameter and may not close the sockets if operations are in progress. This is a corner case and if the user encounters this, they should ensure that their putAll operations are small or allow for a longer wait time to shut down the client If the user encounters this, they should ensure that their putAll operations are small or allow for a longer wait time to shut down the client
#43536 Function API classes must be included in the CLASSPATH The function APIs perform early deserialization during messaging of function results, filters, arguments, and the functions themselves. Therefore, the class for these objects must be included in the JVM's classpath. It is not possible to define your own class loader just before you read a function result or pass the arguments to your code. Add the classes for functions, function arguments, function filters, and function results to the CLASSPATH.
#42452 In case of client server function execution, Execution.execute() becomes a blocking call waiting for ResultCollector to get populated with all results In case of client server function execution, Execution.execute() becomes a blocking call waiting for ResultCollector to get populated with all results. For peer to peer case, it is a non blocking call We need to make the client side function execution non-blocking. ---------------------------------------- List futures = null; try { futures = execService.invokeAll(callableTasks); } catch (RejectedExecutionException rejectedExecutionEx) { throw rejectedExecutionEx; } catch (InterruptedException e) { throw new InternalGemFireException(e.getMessage()); } if (futures != null) { Iterator itr = futures.iterator(); while (itr.hasNext() && !execService.isShutdown() && !execService.isTerminated()) { Future fut = (Future)itr.next(); try { fut.get(); } ----------------------------------------
#42381 Cache creation does not fail if an index configured in cache.xml can not be created If a failure occurs while creating an index during cache creation (for example, gemfire starts up using cache.xml file), the cache should not be created. This will prevent users from trying to query indexes that do not exist. One way to make sure is to look into the stat file to see if all the indexes are created. Or if a query takes a long time than expected, it needs to be analyzed to see if it's using the expected index.
#42041 Calling Function.onServer repeatedly can cause socket exhaustion Heavy use of Function.OnServers from a client can cause sockets to churn and will cause "Too many open files" errors on the locator. If users see "Too many open files" errors when repeatedly calling Function.OnServers(), they should increase the ulimit settings on the host. For example, on Windows, change TcpIP/Parameters/NumConnections in the registry.
#40791 Applications that use GemFire cache client processes should call Cache.close followed by DistributedSystem.disConnect If applications using a client cache do not call DistributedSystem.disconnect(), stale data may be encountered when the application reopoens the cache and subscribes to updates. Applications that use GemFire client caches should call Cache.close() followed by DistributedSystem.disconnect().
#40693 An explicit cache destroy of an entry will be lost (to the backend database). An explicit cache destroy of an entry will be lost (to the backend database) if the entry has been eviction or expiration destroyed. The region.destroy(key) will get EntryNotFoundException. The application can then load the entry and then retry the destroy operation to destroy the entry in the database.
#40624 The EnforceUniqueHostStorageAllocation feature requires no two systems share IpAddresses Using the EnforceUniqueHostStorageAllocation feature requires that no two systems hosting members in a DistributedSystem share the same IpAddress. This is true even if the network adapter is in a "DOWN" state. The exceptions to this rule are the loopback address and the "is any" address (aka 127.0.0.1 and 0.0.0.0 respectively). The symptom when two members do share an IpAddress and the EnforceUniqueHostStorageAllocation system property is set to "true" is a message in the logs similar to the following: system.log: [warning 2009/04/21 10:00:41.290 PDT gemfire1_10503 <thread 1> tid=0x79] Unable to find sufficient members to host a bucket in the partitioned region. Region name = /partitionedRegion Current number of available data stores: 10 number successfully allocated = 3 number needed = 4 Data stores available: [ptestg(13629):58399/50210, lewis(10584):42395/52373, ptestg(13632):58401/50211, ptesth(8852):57714/32881, king(10497):37041/62411, lewis(10582):42398/52374, king(10501):37037/62412, ptesth(8850):57715/32882, king(10499):37039/62407, king(10503):37044/62414] Data stores successfully allocated: [king(10497):37041/62411, lewis(10582):42398/52374, ptesth(8850):57715/32882] Consider starting another member Remove duplicate IP addresses.
#39977 NoSubscriptionServersAvailableException while creating a client with security One some platforms calling getCredentials on the provided PKCSAuthInit template can be slow the first time it is called. This can cause a timeout on the server while creating a connection, resulting in a NoSubscriptionServersAvailableException on the client. Set the system property BridgeServer.acceptTimeout to something higher. The default is 9900 milliseconds.
#39541 Threads hang while blocking for synchronization in JRockit On Java SE 6 versions of JRockit JVM, one or more threads appear to hang while blocking for a synchronization that is not held by any other thread. We have found that this problem can be avoided by disabling lazyUnlocking using: -XXlazyUnlocking:enable=false According to the JRockit documentation: "In R27.5 lazy unlocking is enabled by default in Java SE 6 versions of JRockit JVM on all platforms except IA64 and with all garbage collection modes except the deterministic garbage collection mode." Disabling JRockit's lazyUnlocking seems to prevent these hangs.
#39139 Lease expiration causes locking to hang Lease expiration can cause all other lock requests on the DistributedLockService to hang. Global Region operations may hang for the same reasons. Use -1 for lock-lease to prevent lease expiration
#38250 NotSerializableException can block cache access with if occurring in a region with global or d-ack scope If the application tries to put a instance that isn't serializable into the cache it will block/hang the application and not recover if the region scope is global or d-ack. Add checks before any put or create operations that the object in question is an instance of java.io.Serializable.
#37158 Interrupting threads using DistributedLockService causes other members to hang or generate large log files Some indications that this problem has occurred include statements in the log such as: "Grantor is still initializing" "Grantor creation was aborted but grantor was not destroyed" If these appear in the log, then a thread was interrupted while using the DistributedLockService and the member must be disconnected from the DistributedSystem. Other members may actually hang and possibly produce very large log files. Disconnecting this member from the DistributedSystem will allow other members to continue working without any further problems. Do not interrupt any thread that may be using the DistributedLockService API. Use waitTimeMillis to specify how long the lock request will wait. The thread will not continue to wait after the request times out. Disconnecting from the DistributedSystem will cause any waiting threads to return.