12 December 2018

When standards aren't... aka how do I deal with multiple file attachments in InfoPath

As we all know, Microsoft has been proclaiming InfoPath to be "dead" for many years now, but it's not until recently with PowerApps that we've seen any glimmer of hope for a replacement.  As such, InfoPath lives on and gets migrated from one version of SharePoint to the next version after version after version...

One of the things that made InfoPath special was the ability for users and designers to tinker with the template and move stuff around, add and remove fields etc. and it would maintain all historic schemas in SharePoint so that any older content that was created with said schema, could be opened with the appropriate schema when needed.
Enter migration time.  Queue the dramatic music please.

All migration tools, that's right, every single last one of them, when migrating InfoPath libraries, will only migrate the latest .xsn schema which of course means that ALL forms created with older schemas is now broken!  This stems from the fact that migration tools use the API's provided by Microsoft and the fact that the APIs do not provide access to older schemas.
Being the industrious little problem solver that I am, I wrote my own tool to work around this problem.  The tool simply downloads the latest .xsn from the target library, extracts the template.xml and manifest.xsf for parsing and then iterates the schema node by node while seeking matching nodes in the original source XML data and reconstructing the new target XML on the fly.  Upon completion the new XML document is upload to the target and it fully usable as is.  This works GREAT!

Then we encountered multiple file attachments.
Now I don't know if you've even ventured down the rabbit hole that is InfoPath File Attachments, but let me tell you... it's a mess!  What little information is available on the web is old, outdated and hardly applicable.  Oh and did I mentioned nobody and I mean NOBODY talks about dealing with MULTIPLE FILE ATTACHMENTS!!!

The first glimpses of hope I could gleam from scouring the internet was this Microsoft support article titled "How to encode and decode a file attachment programmatically  by using Visual C# in InfoPath 2003".
Right in the introduction, the article had a reference pointing to an updated version titled "How to encode and decode a file attachment programmatically  by using Visual C# in InfoPath 2010 or in InfoPath 2007".
I read both to be sure.  There's very little difference from a code perspective between the two, but I'll reference code from the 2010 example here.

The article first walks you through the InfoPathAttachmentEncoder() class which is pretty straight forward, but it is important to note that in line 45 they are reading the file to be attached as Unicode.  In line 78 the ToBase64Transform is used to convert the unicode binary of the file on disk to text in Base64 format, that can fit in XML.  Finally in line 93, the memory stream is grabbed as ASCII text and returned to the caller to be inserted to the InfoPath form as an attachment.


using System;
using System.IO;
using System.Text;
using System.Security.Cryptography;
using InfoPathAttachmentEncoding;
namespace InfoPathAttachmentEncoding
{
/// <summary>
/// InfoPathAttachment encodes file data into the format expected by InfoPath for use in file attachment nodes.
/// </summary>
public class InfoPathAttachmentEncoder
{
private string base64EncodedFile = string.Empty;
private string fullyQualifiedFileName;
/// <summary>
/// Creates an encoder to create an InfoPath attachment string.
/// </summary>
/// <param name="fullyQualifiedFileName"></param>
public InfoPathAttachmentEncoder(string fullyQualifiedFileName)
{
if (fullyQualifiedFileName == string.Empty)
throw new ArgumentException("Must specify file name", "fullyQualifiedFileName");
if (!File.Exists(fullyQualifiedFileName))
throw new FileNotFoundException("File does not exist: " + fullyQualifiedFileName, fullyQualifiedFileName);
this.fullyQualifiedFileName = fullyQualifiedFileName;
}
/// <summary>
/// Returns a Base64 encoded string.
/// </summary>
/// <returns>String</returns>
public string ToBase64String()
{
if (base64EncodedFile != string.Empty)
return base64EncodedFile;
// This memory stream will hold the InfoPath file attachment buffer before Base64 encoding.
MemoryStream ms = new MemoryStream();
// Obtain the file information.
using (BinaryReader br = new BinaryReader(File.Open(fullyQualifiedFileName, FileMode.Open, FileAccess.Read, FileShare.Read)))
{
string fileName = Path.GetFileName(fullyQualifiedFileName);
uint fileNameLength = (uint)fileName.Length + 1;
byte[] fileNameBytes = Encoding.Unicode.GetBytes(fileName);
using (BinaryWriter bw = new BinaryWriter(ms))
{
// Write the InfoPath attachment signature.
bw.Write(new byte[] { 0xC7, 0x49, 0x46, 0x41 });
// Write the default header information.
bw.Write((uint)0x14);// size
bw.Write((uint)0x01);// version
bw.Write((uint)0x00);// reserved
// Write the file size.
bw.Write((uint)br.BaseStream.Length);
// Write the size of the file name.
bw.Write((uint)fileNameLength);
// Write the file name (Unicode encoded).
bw.Write(fileNameBytes);
// Write the file name terminator. This is two nulls in Unicode.
bw.Write(new byte[] {0,0});
// Iterate through the file reading data and writing it to the outbuffer.
byte[] data = new byte[64*1024];
int bytesRead = 1;
while (bytesRead > 0)
{
bytesRead = br.Read(data, 0, data.Length);
bw.Write(data, 0, bytesRead);
}
}
}
// This memorystream will hold the Base64 encoded InfoPath attachment.
MemoryStream msOut = new MemoryStream();
using (BinaryReader br = new BinaryReader(new MemoryStream(ms.ToArray())))
{
// Create a Base64 transform to do the encoding.
ToBase64Transform tf = new ToBase64Transform();
byte[] data = new byte[tf.InputBlockSize];
byte[] outData = new byte[tf.OutputBlockSize];
int bytesRead = 1;
while (bytesRead > 0)
{
bytesRead = br.Read(data, 0, data.Length);
if (bytesRead == data.Length)
tf.TransformBlock(data, 0, bytesRead, outData, 0);
else
outData = tf.TransformFinalBlock(data, 0, bytesRead);
msOut.Write(outData, 0, outData.Length);
}
}
msOut.Close();
return base64EncodedFile = Encoding.ASCII.GetString(msOut.ToArray());
}
}
}
The decoder class that follows is the same except in reverse, taking in the base64, ASCII data and then converting it into a byte array in a memory stream in lines 25/26.  A BinaryReader() then reads the memory stream to decode the attachment.

using System;
using System.IO;
using System.Text;
namespace InfoPathAttachmentEncoding
{
/// <summary>
/// Decodes a file attachment and saves it to a specified path.
/// </summary>
public class InfoPathAttachmentDecoder
{
private const int SP1Header_Size = 20;
private const int FIXED_HEADER = 16;
private int fileSize;
private int attachmentNameLength;
private string attachmentName;
private byte[] decodedAttachment;
/// <summary>
/// Accepts the Base64 encoded string
/// that is the attachment.
/// </summary>
public InfoPathAttachmentDecoder(string theBase64EncodedString)
{
byte [] theData = Convert.FromBase64String(theBase64EncodedString);
using(MemoryStream ms = new MemoryStream(theData))
{
BinaryReader theReader = new BinaryReader(ms);
DecodeAttachment(theReader);
}
}
private void DecodeAttachment(BinaryReader theReader)
{
//Position the reader to obtain the file size.
byte[] headerData = new byte[FIXED_HEADER];
headerData = theReader.ReadBytes(headerData.Length);
fileSize = (int)theReader.ReadUInt32();
attachmentNameLength = (int)theReader.ReadUInt32() * 2;
byte[] fileNameBytes = theReader.ReadBytes(attachmentNameLength);
//InfoPath uses UTF8 encoding.
Encoding enc = Encoding.Unicode;
attachmentName = enc.GetString(fileNameBytes, 0, attachmentNameLength - 2);
decodedAttachment = theReader.ReadBytes(fileSize);
}
public void SaveAttachment(string saveLocation)
{
string fullFileName = saveLocation;
if (!fullFileName.EndsWith(Path.DirectorySeparatorChar.ToString()))
{
fullFileName += Path.DirectorySeparatorChar;
}
fullFileName += attachmentName;
if(File.Exists(fullFileName))
File.Delete(fullFileName);
FileStream fs = new FileStream(fullFileName, FileMode.CreateNew);
BinaryWriter bw = new BinaryWriter(fs);
bw.Write(decodedAttachment);
bw.Close();
fs.Close();
}
public string Filename
{
get{ return attachmentName; }
}
public byte[] DecodedAttachment
{
get{ return decodedAttachment; }
}
}
}
They never delve into dealing with multiple attachments at any point.  In fact the reference to the attachments in XML is very small and always worked from a view of single file attachments as can be seen in the code to save the attachment thus:

//Create an XmlNamespaceManager
XmlNamespaceManager ns = this.NamespaceManager;
//Create an XPathNavigator object for the Main data source
XPathNavigator xnMain = this.MainDataSource.CreateNavigator();
//Create an XPathNavigator object for the attachment node
XPathNavigator xnAttNode = xnMain.SelectSingleNode("/my:myFields/my:theAttachmentField", ns);
//Obtain the text of the node.
string theAttachment = xnAttNode.Value;
if(theAttachment.Length > 0)
{
InfoPathAttachmentDecoder myDecoder = new InfoPathAttachmentDecoder(theAttachment);
myDecoder.SaveAttachment(@"Path to the folder to save the attachment");
}
As can be seen from line 8, the use of the SelectSingleNode() method assumes only one file attachment.  So this helped me understand the magic behind how the documents are encoded and stored in the XML of the InfoPath document, but it still wasn't giving me any glimpses into dealing with multiple attachments.  Further searching brought me to the BizSupportOnline.net site and in particular their "Top 10 questions about InfoPath attachments" article.  Eureka!  I thought.  This has to be it.  They had multiple questions with answers and links to videos that would hold the key information I sought... except... their video links were all dead. 😡
My heart sank to new lows as my struggle entered day two.  I stumbled upon this article from Kunaal Kapoor titled "Getting InfoPath attachments from a submitted form" wherein he explains how it could be done using the XSD Visual Studio tool to build classes off the InfoPath template.  This seemed like a major departure from my current implementation, but desperate times you know...
So I tried it and as I suspected, it didn't work.  OK, back to my former path.  I know it's in there in the XML, I just have to find it.

OK, back to basics.  Rule #1 of debugging.  NEVER ASSUME ANYTHING!
So let's check, double check and triple check all our bases again.
Let's look at the attachment definition in the form.


























OK, it's a simple field control, of type Picture or File Attachment (base64) with the Repeating option checked.  Nothing out of the ordinary here.
Next I decided to compare the character by character, line by line results of the files.  Time for my beloved Beyond Compare!  If you don't have this tool in your toolbox, you're seriously missing out on productivity.  The folks at Scooter Software does an awesome job with this gem of a tool.  A definite must have for every geek.
As I did my comparison and got to the end of the first attachment, this is what I noticed:








On the left is the original source file that had 3 attachments.  On the right is the processed result with just one.  Obvious right?  So then the question is... why do we only see one Attachment node in the code during processing.  The answer is based on the presupposition that repeating data represented in standard XML is represented in a collection node thus:

<xml>
  <LastName/>
  <Attachment>
    <File Name="abc.doc"/>
    <File Name="xyz.doc"/>
    <File Name="ugh.doc"/>
  </Attachment>
</xml>

Instead, the way InfoPath chose to represent multiple attachments in this case was as follows:

<xml>
  <LastName/>
  <Attachment Name="abc.doc"/>
  <Attachment Name="xyz.doc"/>
  <Attachment Name="ugh.doc"/>
</xml>

As a result, when my code was processing the childnodes of the XML document, looking for a match against the schema in order to rebuild the new target document, it found the first Attachment node and was satisfied and then moved on.  If the attachment was represented in a standards based collective node, all the files would have been caught.

It just goes to show you.  Rule #1 is so true.  Never assume anything.  Not even when the source is a large, reputable, multi-billion dollar company. 😎

<EDIT>
If you'd like to leverage the code I wrote to solve this, I posted a follow on article the following day that can be found here:
http://blog.cjvandyk.com/2018/12/get-infopath-attachments-open-source.html
</EDIT>

Happy coding
C

No comments:

Post a Comment

Comments are moderated only for the purpose of keeping pesky spammers at bay.

SharePoint Remote Event Receivers are DEAD!!!

 Well, the time has finally come.  It was evident when Microsoft started pushing everyone to WebHooks, but this FAQ and related announcement...