Skip to content

String vs Inline String vs Shared String in Open XML

August 25, 2015

While working on a spreadsheet generation process, I came across three different ways to specify a string cell value in Open XML. While that is confusing enough, if you look up on MSDN, this is all you see:

SharedString Shared String. When the item is serialized out as xml, its value is “s”.
String String. When the item is serialized out as xml, its value is “str”.
InlineString Inline String. When the item is serialized out as xml, its value is “inlineStr”.

ref: https://msdn.microsoft.com/en-us/library/documentformat.openxml.spreadsheet.cellvalues(v=office.14).aspx

Obviously, that does not give any information on why I should choose one type over the rest. So, for people like me that’s looking for a basic difference, here’s a summary of what each type is:

String

String datatype works best if you want to store a function in a cell.

<c>

<f>SUM(A2,A3)</f>

<v>1234</v>

</c>

The v element stores formula value that was generated when the formula executed the last time. So, if there is a value on v, that value is directly read. If not, Open XML reader computes the value based on the formula and stores it for future use. The formula itself is stored as text.

Shared String

When you use a Shared String to represent your string, the size of the file is drastically reduced, especially if it has a lot of repeating strings. This is because, when a string is added, it checks if it is already present and if it does, only a reference to that original string is added. If it is not present, then it is added to the file, reference stored in the cell and that reference is used later on when that string appears again.

<c r=”A1″ t=”s”>
<v>0</v>
</c>

The v element holds the reference.

Inline String

Inline String allows you to directly save a text without having to save the reference in its place. To create a cell with text you use the ‘is’ element, which stands for inline-string. An InlineString is saved as rich text.

<c r=”A0″>  <is><t> Test String </t></is></c>

This is a much more easier way to code and easy to read. However, for data containing strings that repeat, this becomes inefficient.

That’s just the basic difference. Pick your option based on performance vs readability.

Nugget of Wisdom

July 26, 2015

True prosperity is the result of well placed confidence in ourselves and our fellow man.

~ Benjamin Burt

Creating Left Joins and Aggregates using Entity Framwork

July 21, 2015

One of the deficiencies that I run into from time to time in Entity Framework (EF) is accommodating parent – child relationships where the list of child elements is empty, unless I define that relationship in the Entity class itself.

For example, consider these two classes:

public class Order {
 [key]
 public int OrderId { get; set; }
 public int CustomerId { get;set; }
 public string Name { get; set; }
}
public class OrderDetail {
 [key]
 public int OrderDetailId {get;set;}
 public int OrderId {get;set;}
 public string ProductCode { get; set;}
}

In EF, I can create a relationship between these by modifying the Order class like this:

public class Order {
 [key]
 public int OrderId { get; set; }
 public int CustomerId { get; set; }
 public string Name { get; set; }
[ForeignKey(“OrderId”)]
 public List<OrderDetail> OrderDetails { get; set; }
}

Then, when I’m using EF to select from these tables, EF will automatically include my join and get the OrderDetails for me, as long as I do something like this:

var list = _dc.Orders
 .Include(“OrderDetails”)
 .ToList();

This setup will give me all of the orders, even if the OrderDetails collection is empty, but if it’s not empty I’ll get the details as well, which is great.

The trouble I find sometimes is that I may want to create a join that works just like this, but that doesn’t require me to define the relationship in the Entity class definition. For example, continue considering the classes above, but now add this one into the mix:

public class Customer {
 [key]
 public int CustomerId {get;set;}
 public string Name { get; set; }
 public bool IsActive { get; set; }
}

Suppose that I want to get all of the active customers, and a count of all of the orders each customer has. Sure, I could modify the Customer Entity class to include a reference to the Order class, but if I do that it means that EF will need to consider that relationship (and make a decision about whether it wants to include a query to get the Order results) every time I access my Customer class. Let’s suppose that that produces more overhead than I want in this case, because there are lots of times when I will want to access the Customer entity without respect for whether there are any orders. My first reaction is that I can just write a join EF statement, like this:

var list = _dc.Customers
 .Join(_dc.Orders, c => c.CustomerId, o => o.CustomerId, (c, o) => new { Customer = c, Order = o }
 .Where(x => x.Customer.IsActive)
 .GroupBy(x => new { x.Customer.CustomerId, x.Customer.Name }
 .Select(x => new { x.Key.CustomerId, x.Key.Name, OrderCount = x.Count() })
 .ToList();

This will give me what I’m looking for (the customer fields for active Customers, plus a new field called “OrderCount” that counts the number of Order records associated with my Customer), but it has a major drawback – I’ll only get results when a record already exists in the Order table. Any customers that haven’t created an Order yet won’t be returned, because they’re failing the Inner join condition. This can be a major problem, but I’m glad to say that with EF 5 an answer to this scenario was introduced. Here’s what the answer looks like:

var list = _dc.Customers
 .GroupJoin(_dc.Orders, c => c.CustomerId, o => o.CustomerId, (c, o) => new { Customer = c, Orders = o }
 .Where(x => x.Customer.IsActive)
 .DefaultIfEmpty()
 .Select(x => new { x.Customer.CustomerId, x.Customer.Name, OrderCount = x.Orders.Count() })
.ToList();

The important points to notice here are that the .Join keyword was replaced with .GroupJoin, and that we used a .DefaultIfEmpty statement after the .Where clause.

.GroupJoin is a keyword that works just like a regular join, but instead of creating results that exist at the same hierarchical level as the first table, it creates records that are gathered into in an IEnumerable<T> as children of the parent. In other words, the same sort of relationship that is created when we modify our Entity classes like we did with the Order and OrderDetail classes back at the beginning. This means that we don’t need to use a .GroupBy statement because the results are already organized into a tiered anonymous type. That’s very cool!

.DefaultIfEmpty is a keyword that instructs EF to create that relationship and return an IEnumerable even if the child table has no results. In other words, it changes your SQL from INNER JOIN to LEFT JOIN. In this case, since I’m using an aggregate of the child table (.Count), I will get 0 in my result set if no rows were found, or if there were records in the Order table, I’ll get the count. That’s exactly what I was looking for. Note also that .DefaultIfEmpty comes after the .Where keyword, so it will be evaluated after any restrictions are placed on the query.

Here is the same query, but this time it outputs into a List of a pre-defined classe called “CustomerOrder”, which includes both an aggregate of the orders (OrderCount) and a List<Order> of the orders themselves (Orders):

var list = _dc.Customers
 .GroupJoin(_dc.Orders, c => c.CustomerId, o => o.CustomerId, (c, o) => new { Customer = c, Orders = o }
 .Where(x => x.Customer.IsActive)
 .DefaultIfEmpty()
 .Select(x => new CustomerOrder() {
    CustomerId = x.Customer.Id,
    Name = x.Customer.Name,
    OrderCount = x.Orders.Count(),
    Orders = x.Orders.ToList()
 })
 .ToList();

Now my variable contains a List<CustomerOrder>, which I can return from my data layer, and I have a strongly-typed set of objects to use for my business logic.  Perfect!

Securing the Future of the Web

July 13, 2015

In recent months I’ve seen rumblings of change on the web. It would seem that in an era of spying, we are seeing a real push to secure all traffic that moves on the web. Back in December, the Chrome Security Team proposed a new standard to indicate the non-secure status of a webpage. In their view, people have become used to seeing an indicator when a website is secure, but they generally don’t see the lack of an indicator as “proof” of insecurity. More recently, Mozilla, the team that works on Firefox, has proposed deprecating the non-secure HTTP standard altogether. Their approach will gradually increase the expectation for security, and new features will not be available to users.

In a way, this is the end of an era. Running your site securely has generally been the domain of shopping, banking, and honestly anywhere you enter a password. This would move that any site run securely. This does add an additional burden to folks who wish to maintain a presence on the web, but there are already many costs involved, from purchasing and maintaining a domain name (or many) to the hosting fees. This will now be part of the “new normal” for web operations.

Periodic Table of Content Marketing

July 10, 2015

Ran across this from Econsultancy – incredible (and very creative) presentation of the complex world of today’s Digital Marketing environment.

The_Periodic_Table_of_Content_Marketing

Navigating the wealth of digital marketing channels is daunting to say the least – particularly if you want to have half a chance at reaching an acceptable level of return on investment (ROI).

At SNQ – we’re here to help (and yes we love this stuff)!

Scott

www.statusnotquo.com

My Getting Better Moment

June 26, 2015

Many years ago when I was a young programmer, I screwed up. It wasn’t a normal “I spilled coffee” type of screw up, it was the kind that makes you dizzy when you think about just how much money it cost the company. This is the story of that screw up, and how it changed my mentality as a software engineer.

My first job out of college was for one of America’s big financial institutions. I came on board in the Spring of 2007. For those who don’t recall the events of 2007, do you remember the boulder scene from Indiana Jones and the Raiders of the Lost Ark? That is a perfect metaphor for the economy at that time. Indiana Jones was your average financial institution, and the credit crunch, congressional bills, and share holders are traps that could trip you up and end you at any moment.

 

Boulder

 

In my first year, I spent most of my time reacting to financial regulation bills. I was given tasks, and I dove right in. The systems varied wildly. Each project had its own technologies and tools. One day I would be working in COBOL, and then next in C# (and yes, in 2007 COBOL systems where still being developed!) I would make a change and I would kick it over to QA, who would of course tell me if anything didn’t work as expected.

The cowboy approach paid off. I was praised for my ability to get things done quickly. Perhaps worst of all, it worked; At least for a while.

About a year in, I was given a task to write a data import process for one of our new systems. The system used a proprietary language written by a small company, which I’m convinced must have had the best salespeople in the world. The language itself was derived from VB6, and used a Caché database. I completed the task with little thought, and moved on.

Days passed, then weeks, and months. One day I was called in to my VP’s office. The VP was there, along with my team lead, manager, one of our senior programmers, and several programmers from the company that wrote the proprietary language used for the system. The second I walked in the room I knew something was very, very wrong.

Apparently, the data import I wrote, which originally ran in about an hour, was now taking about 10 hours to run, getting slower each day! After much research by the external company’s programming team, they found the source… my import. For each record that I was updating in my data import, I was also creating a new empty record in the database, and this was slowing the system down to a crawl. My hastily thrown together program had created hundreds of millions of empty records in the production database.

While the company placed the blame at the feet of the language and its vague syntax, I knew that I should have caught it myself.  My days of “conquering” a problem as fast as possible needed to be end. Code organization, architecture, readability, and testing needed to become my priorities if I wanted to continue to grow as a software engineer. I let books like “Code Complete”, “Clean Code”, and the Gang of Four’s “Design Patterns” help me redefine what being a good software engineer means to me.

So that’s my Getting Better Moment. What’s yours?

Me first, me first!

June 16, 2015

I’ve spent a lot of time driving in southern California, averaging over 35,000 miles per year for twenty years.  Thankfully, I don’t drive quite that anymore, but I still drive quite a bit.  One thing I’ve noticed over the years that’s become more and more prevalent is what I call the “me first” syndrome.  There are no cars behind you for literally over a mile, but the driver next to you will speed up to move over in front of you (often requiring you to hit the brakes).  When lanes merge, drivers will freely use the emergency lane or shoulder to get just a few more cars ahead, merging at the last second.  The concept of taking turns appears to escape them completely.  And while this is admittedly very annoying when driving, it’s the impact to organizations that’s more concerning.

The mindset of me first and the devil take everyone else is detrimental to an organization.  Even in a company filled with individual contributors, interaction and resource allocation has to occur.  If everyone prioritizes their wants and needs above those of everyone else, the end result is fighting over resources, strife-filled relationships, inefficiencies and even failure.  Because sooner or later, compromise, whether as minor as letting a colleague “go first” so they can make a meeting, or as major as giving up or sharing resources, making your project more challenging to complete but supporting the collective success of multiple projects, has to happen.  And a me first attitude precludes even the idea of compromise – negatively impacting you, your colleagues, and your organization.  Compromise doesn’t mean losing, or giving up everything to someone else.  Compromise is achieved by “…each side making concessions.”1 Give a little, get a little, and everyone succeeds.

One of our clients was going through their annual budgeting process.  One manager suggested that they set the deadlines, send out budget templates, and “just hold the managers to the dates – it’s part of their job.”  I’ve seen the results of that approach before – missed deadlines, inaccurate information, and more time following up.  And the end result is often an inaccurate and poorly understood budget with no buy-in by the team – basically, a waste of time.  Our suggestion was to make the choice to support the managers, by scheduling budget building sessions around their schedules and providing a guided walk-through of the process.  Yes, this was painful for the finance team – we were already short on resources and this was going to be a major time commitment.  The managers had to commit to living to the budget they built (no outs by saying it was built by finance, or they didn’t understand it).  But the end result?  A solid budget, understood by all of the management team, delivered on schedule – with the entire management team committed to living by it.

So the next time the lane you’re in on the freeway merges with another, why not let that car next to you merge in?  You’ll both get to your destination more quickly, with less stress – and shouldn’t that be your goal?

1 https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=compromise%20definition

Follow

Get every new post delivered to your Inbox.

Join 339 other followers