Bug 1080

Summary: Add support for memory mapped big file I/O via specialized InputStream and OutputStream, incl. mark/reset
Product: [JogAmp] Gluegen Reporter: Sven Gothel <sgothel>
Component: coreAssignee: Sven Gothel <sgothel>
Status: RESOLVED FIXED    
Severity: enhancement    
Priority: ---    
Version: 2.3.0   
Hardware: All   
OS: all   
Type: FEATURE SCM Refs:
ae17a5895088e321bc373318cc1e144a2f822f29 95c4a3c7b6b256de4293ed1b31380d6af5ab59d0 92a6d2c1476fd562721f231f89afba9342ed8a20 00a9ee70054872712017b5a14b19aa92068c8420 a7a3d5ab98ee0ad33fdef50bf081afeb8295ebe4 bd240ebfe09b7c7a21689dee8be0cc673eb7f340
Workaround: ---

Description Sven Gothel 2014-09-25 23:42:36 CEST
It is desired to read and write big files via InputStream and OutputStream 
while having mark/reset supported in a most efficient way.

BufferedInputStream, which does support mark/reset,
can only handle up to 2MiB files due to byte[] usage.

This is even more restricted on some platforms, 
since it uses heap memory which might be not available.

Further, performance is not ideal.

+++

Add memory mapped InputStream and OutputStream implementations
supporting mark/reset.
Comment 1 Sven Gothel 2014-09-25 23:52:13 CEST
commit ae17a5895088e321bc373318cc1e144a2f822f29

Add read support for memory mapped big file I/O via specialized InputStream impl., incl. mark/reset

- ByteBufferInputStream simply impl. InputStream for an arbitrary 2MiB restricted ByteBuffer
  - Users may only need a smaller implementation for 'smaller' file sizes
    or for streaming a [native] ByteBuffer.

- MappedByteBufferInputStream impl. InputStream for any file size,
  while slicing the total size to memory mapped buffers via the given FileChannel.
  The latter are mapped lazily and diff. flush/cache methods are supported
  to ease virtual memory usage.

- TestByteBufferInputStream: Basic unit test for basic functionality and perf. stats.
Comment 2 Sven Gothel 2014-09-26 12:30:30 CEST
95c4a3c7b6b256de4293ed1b31380d6af5ab59d0

Fix TestByteBufferInputStream: Handle OutOfMemoryError cause in IOException (Add note to FLUSH_NONE); Reduce test load / duration.
Comment 3 Sven Gothel 2014-09-26 12:31:10 CEST
92a6d2c1476fd562721f231f89afba9342ed8a20

Bug 1080 - Add write support for memory mapped big file I/O via specialized OutputStream impl.

Added MappedByteBufferOutputStream as a child instance of MappedByteBufferInputStream,
since the latter already manages the file's mapped buffer slices.

Current design is:
  - MappedByteBufferInputStream (parent)
    - MappedByteBufferOutputStream

this is due to InputStream and OutputStream not being interfaces,
but most functionality is provided in one class.

We could redesign both as follows:
  - MappedByteBufferIOStream (parent)
    - MappedByteBufferInputStream
    - MappedByteBufferOutputStream

This might visualize things better .. dunno whether its worth the
extra redirection.

+++

MappedByteBufferInputStream:
  - Adding [file] resize support via custom FileResizeOp
  - All construction happens via ctors
  - Handle refCount, incr. by ctor and getOutputStream(..), decr by close
  - Check whether stream is closed already -> IOException
  - Simplify / Reuse code

MappedByteBufferOutputStream:
  - Adding simple write operations
Comment 4 Sven Gothel 2014-09-26 12:32:20 CEST
Basic functionality now added incl. unit tests
passed on Windows and GNU/Linux 32- and 64bit
using JRE7 and JRE8 (Oracle/OpenJDK).

Further refinements may happen via a followup bug report.
Comment 5 Sven Gothel 2014-09-27 19:56:08 CEST
To render the MappedByteBuffer*Stream more useful, 
we might add JNI native mmap and munmap ?

This would enhance 'flushing' of a mapped buffer slice
and hopping to the next.
Right now, we use an array of slices,
but native mmap/munmap could remove such use,
map the current 'window' directly
and also ensuring the unmap and hence release.
Currently the unmap is only impl. in a fuzzy way,
i.e. via GC or private 'cleaner' method.

+++

Also a r/w method using ByteBuffers might seem useful as well.

+++
Comment 6 Sven Gothel 2014-09-29 03:58:46 CEST
commit 00a9ee70054872712017b5a14b19aa92068c8420
  Refine MappedByteBuffer*Stream impl. and API [doc], 
  adding stream to stream copy 
  as well as direct memory mapped ByteBuffer access
Comment 7 Sven Gothel 2014-10-03 03:24:38 CEST
a7a3d5ab98ee0ad33fdef50bf081afeb8295ebe4

- Validate active and GC'ed mapped-buffer count
  in cleanAllSlices() via close() ..

- Fix missing unmapping last buffer in notifyLengthChangeImpl(),
  branch criteria was off by one.

- cleanSlice(..) now also issues cleanBuffer(..) on the GC'ed entry,
  hence if WeakReference is still alive, enforce it's release.

- cleanBuffer(..) reverts FLUSH_PRE_HARD -> FLUSH_PRE_SOFT
  in case of an error.

- flush() -> flush(boolean metaData) to expose FileChannel.force(metaData).

- Add synchronous mode, flushing/syncing the mapped buffers when
  in READ_WRITE mapping mode and issue FileChannel.force() if not READ_ONLY.

  Above is implemented via flush()/flushImpl(..) for buffers and FileChannel,
  as well as in syncSlice(..) for buffers only.

  flush*()/syncSlice() is covered by:
    - setLength()
    - notifyLengthChange*(..)
    - nextSlice()

  Always issue flushImpl() in close().

- Windows: Clean all buffers in setLength(),
  otherwise Windows will report:

- Windows: Catch MappedByteBuffer.force() IOException

- Optimization of position(..)
  position(..) is now standalone to allow issuing flushSlice(..)
  before gathering the new mapped buffer.
  This shall avoid one extra cache miss.

  Hence rename positionImpl(..) -> position2(..).

- All MappedByteBufferOutputStream.write(..) methods
  issue syncSlice(..) on the last written current slice
  to ensure new 'synchronous' mode is honored.

+++

Unit tests:

- Ensure test files are being deleted

- TestByteBufferCopyStream: Reduced test file size to more sensible values.
Comment 8 Sven Gothel 2014-10-03 04:18:00 CEST
bd240ebfe09b7c7a21689dee8be0cc673eb7f340

MappedByteBufferInputStream: Default CacheMode is FLUSH_PRE_HARD now (was FLUSH_PRE_SOFT)

FLUSH_PRE_SOFT cannot be handled by some platforms, e.g. Windows 32bit.

FLUSH_PRE_HARD is the most reliable caching mode
and it will fallback to FLUSH_PRE_SOFT if no method for 'cleaner' exists.

Further, FLUSH_PRE_HARD turns our to be the fastest mode as well.