Attendees received a copy of his latest book – Misbehaving, which I’m making my way through (about 50% done) and now own in paper, digital, and audio formats… – and had the opportunity to hear him talk for about an hour on anecdotes about behavioural economics and answer a few audience questions.
I tried to ask a question, but was not picked out in the audience. So, documenting it here and hoping readers can help me with pointers or answers (of course, I’d be thrilled if Prof.Thaler would address it himsefl).
My question[s] – with some background but hopefully not annoyingly “monopolizing the mic”:
On one side of the spectrum, we know that behaviour factors play a huge part in individuals making transactions – choosing to donate organs, saving for retirement, …. On the other, we see high institutional ownership of shares and to my knowledge the significant majority of stock trades are either algorithmic or at least “professional”, which we expect to fall under the purview of efficient markets, etc…
This is relevant to my interests in information security as we determine which kind of program or action should be more “behavioural” or should be more “rational”. At which point should the actions of agents be modeled one way or the other?
I’m always attempting to learn more, so maybe this is just a naïve question that’ll be answered further down my studies, but would love to hear insights.
Any ideas? Comment below or reach out to me.
- I did not watch the keynotes, so I may have missed any specific set-up done by the larger vendors in their original pitches.
- The company I work for did not have a booth, so my skepticism might seem self-serving. Besides assuring you it is not, not much else I can do…
- Reminder: As always, opinions are my own 🙂
- Authority – a request or message coming from someone of [perceived] authority yields better compliance. Think ‘people in lab coats discussing medical products on TV’.
- Scarcity – if something is framed as being in short supply (units or time), or otherwise restricted, will yield better influence. “Only good for 24hrs!” kind of messaging.
- Liking – if the person or entity requesting something is someone we “like”, we tend to comply a lot more often.
- Social Proof – the effect (real or not) of someone similar to you resonates extremely well.
- Reciprocity – should you receive a ‘gift’ from someone, your receptiveness to their requests increases significantly.
- Consistency – finally, if someone is able to frame a request in a way that is compatible with how you perceive yourself, there’s a higher likelihood you’ll comply.
- Suits. Suits everywhere. Anyone working in a “senior” capacity in business development, sales, etc… was likely wearing a suit. Some of the smaller booths had senior people in the standard booth uniform, but to me that was meant to signal something else – that the company has enough people – so it’s understandable.
- An interesting observation on authority. As I walked the floor, I looked at the wording and visual aspects in the various booths. Larger booths from more familiar brands had very clear messages that were just the brand itself or basic functionality about their offering (“DDoS Protection”, “Malware Analysis”, “User Behaviour Analytics”, …). Smaller booths – disproportionally housing smaller companies- however, had much more emphatic messages: “Leader” in this, “Complete Security” in that, “Best of “ in whatever. This, to me, is a clear appeal to authority.
Funny enough, though, there were at least two exceptions that I thought were noticeable:
- A very prominent software vendor had a relatively large floor presence in the North Hall, but their message carried the same “look at me” style of messaging by calling themselves “the global leader in…”.
- A very large software company comfortably situated in the Fortune 100 list had a *tiny* booth on the North Hall, alongside upstarts. It also had the same messaging as upstarts (“Maximum security”). Frankly, if they couldn’t afford to pay for at least a mid-sized booth, what where they even doing there?
- Every vendor was unique. Vendors seem to dislike being framed in the same category as others. Every one has a peculiar element that makes them unique. This is extremely useful when trying to explore ‘scarcity’ as a trigger. “We’re the only UBA with strong crypto analytics and threat intel feeds” or something along those lines. If you believe that vendor to be unique, how will you consider alternatives?
Liking is inherent to a trade show:
- You’d be hard pressed to find a “sad” face in all the expo floor. Sure, some organizations (such as government agencies or non-commercial firms such as business development offices) may have less appetite for easy banter, but mostly everyone else was “happy”.
- Liking also extended to the vendor allowing you to do nice things, such as going ‘Office Space [slightly NSFW]’ on older equipment, shooting Nerf guns, or letting you meet a trendy actor.
- Conference Tchotchke/Trinkets. From Star Wars lightsabers, to USB fans, to drones, to stress balls, to pens, … one could fill volumes of luggage with all the giveaways. They are a clear appeal to reciprocity, along with the drinks/popcorn/… served throughout the expo floor. Personally, I liked the popcorn. 🙂
- Conference Events/Parties. Sure, not only enjoy the giveaways at the expo floor, but come join your vendor for a bash afterwards.
- Social Proof was in display in every mention of how many thousand people attend the conference, as well as the consistency in the overall materials – from the Norse lanyards to many “(ISC)2” ribbons attached to the badges. The message is clear: “you’re all part of the same community”. Not a bad message overall, of course, but also a nudge that if people are looking at a particular demo/booth, hey, you’re not so different from them and maybe you should too…
- Consistency seems to come afterwards. After you scan your badge at the booths – either as a condition to get the aforementioned trinket or just because you’re around watching a demo – the inevitable post-RSA email arrives: “You visited our booth and had interest in our solution. How would you like to schedule a sales call/demo?” (thanks to @MeneghelAna for helping me dissect this usage).
- Quite a few vendors – large & small – had presence on BOTH North and South expo halls. Marketing budgets must have been plenty this year…
- Lots of ‘Endpoint’ solutions, alongside ‘Analytics’.
- Too many ‘pew pew’ maps, including in 3D!
So, in essence, a skeptical walk through the expo floor sees many examples of influence. Be aware (and beware…) of it, at RSA and elsewhere.
Lots of people (particularly in our echo chamber) have very negative opinions on the conference. I’m not one of them. I really like the opportunity to learn interesting perspectives from the sessions (sure, some may be ‘basic’, but we’re not all experts at everything, are we?) and I *love* the opportunity to catch up with people I only see at conferences.
That being said, I struggle to find value in the expo floor. Sure, it is a great arena to run into folks, but for other interactions (looking at new products/technologies, chatting up with your friendly vendor, …) there are better options, IMHO.
This is no longer the age of COMDEX.
- Endowment Effect. This is the notion that if you happen to “own” something, you value it more than if you don’t.
- Loss Aversion. Somewhat related to endowment, this is the key insight that one feels the pain of loss of a certain amount ‘x’ as greater than the pleasure of gaining the same amount ‘x’.
- Availability Bias. You’ll attribute more importance/frequency to information that you have come across recently.
- Cognitive Dissonance. The stress caused by holding contradictory thoughts and the rationalizations that are done to resolve this.
- Social Proof and variations (group bias & others). When one assumes the behaviours of others to be correct.
- Sunk Cost Fallacy. Continuing to invest in something because so much as been spent on it already.
- due to a desire to resolve any cognitive dissonance, you’ll hold a generally more positive opinion of that vendor. “If I went through the effort of certifying on that vendor’s product *and* I consider myself a good person, then that vendor must be good too.”
- because of the the endowment effect, you’ll likely hold a more positive opinion of others who have the same certification. This may come through on sales calls, hiring, etc…
- the availability bias will kick in when thinking of alternatives, meaning you may have an easier time recalling a specific vendor’s offerings or technology, particularly if they refer to [re]certification topics.
- social proof will kick in when you see that certification in prominent display by vendors when visiting trade shows, elections, … Vendors often offer certification exams at their shows (sometimes even waving the exam fees): it is extremely convenient for the test taker, but the visual of hundreds or thousands of your peers taking those exams is a shining example of social proof in action..
- it’ll likely be really difficult to let go of that cert, or that particular vendor. That communication saying “your certification has now expired” is really painful. Such is the impact of the sunk cost effect (and loss aversion).
- They can provide a roadmap for learning, checkpoints for measuring your skill.
- They can be a very effective (though not perfect) means of resolving the information asymmetry inherent in professional situations, both as signals and as screens.
- They can help establish relationships with like-minded professionals.
This is a topic I’ve been meaning to write about for a while. I’d love to receive feedback on it: please, let me know your thoughts… (It got a little long, so bear with me.)
- “Should I get <insert cert name>?”
- “Is <insert cert name> a good cert to have?”
- “Why does HR insist on having <insert cert name> as requirement even though I know WAY more than that?”
- “Wondering if I should keep my <insert cert name> or let it lapse”
- “What do I need to do to pass <insert cert name>? Any brain dumps? ;-)”
Please note: For many of the points below, someone can almost replace “certification” with “degree”. The discussion whether or not to get a degree – College, Bachelors, Post-Graduate, … – is, in my opinion, deeper than the certification one, with much more significant implications. Let’s treat that one separate, shall we? Baby steps…
In any economic transaction, information asymmetry is the notion that parties in a transaction have different information given their roles, and that each will alter their behavior to maximize their own utility. As a buyer, you may not know much about the quality of the product you’re buying as much as the seller does. However, as a seller, you don’t know how much the buyer is willing to pay for the goods you’re selling, or even if they can actually pay for them.
This is no judgement on either party, but an inherent characteristic of the economic transaction itself: only you know how badly you want a particular car, just as the previous owner of the car knows how well it’s been taken care of over the years.
- The ‘over-informed’ party can SIGNAL to the under-informed party by presenting information that attempts to resolve the asymmetry. Examples: “this is a ‘certified’ pre-owned’” or “here is my latest pay stub to show that I’m good for credit”.
- The ‘under-informed’ party can SCREEN the over-informed party by asking for information or offering choices that force the other to reveal that information. Examples: “give me three references from your career”, “show me your insurance policy against errors & omissions”.
Also important to recognize is that there is a cost associated with both signaling and screening, and that this cost can also be a signal on its own right. Knowing that a signal is expensive to generate might be interpreted as a stronger signal of commitment, or that a complicated screening process might indicate level of importance of the decision, and therefore the value of whatever is being bought.
The study of information asymmetry has been worthy of Nobel prizes – George Akerlof, Joseph Stiglitz, and Michael Spence shared the 2001 Economics prize on this topic. At the risk of sounding geeky, I think this is truly fascinating stuff…
- “Here is my <insert cert name>” signals that you [possibly] have the skills/knowledge/experience associated with that cert.
- “This position requires <insert cert name>” is a screening mechanism meant to easily (from the point of view of the recruiter) winnow out candidates that have a low likelihood of having the necessary skills/knowledge/experience. It forces candidates to demonstrate at least some commitment to that area.
- the content of a certification may not be relevant to the true skills/knowledge/experience required, but may still be considered adequate or even required.
- the certification process may be broken and allow those without the skills/knowledge/experience required to still obtain the certification.
- the cost of obtaining the certification may become an impediment and artificially screen out candidates that would otherwise be suitable.
- and so on…
Nevertheless, they are useful heuristics to be applied to the true problem at hand: reducing information asymmetry. If we focus on that, we can provide better advice. Let’s try to put that to practical use…
- The certification is part of a formal gate in a process: be it a promotion, formal tender, partner requirements, etc… In this case it’s pretty simple: if you [often] find yourself in that formal process and you want to continue, get the certification.
- The certification is to be used as an informal roadmap for learning. I do this often (see disclosure below). In that case, ask yourself: how high is the marginal cost of actually obtaining the certification after your studying is done? If you look at the cert as roadmap, study a lot, then just need a simple exam after, it may be worth it actually getting it. If, on the other hand, the preparation for the actual certification is arduous and/or the exam is expensive (CCIE/CCDE, VCIX/VCDX, SANS GSE come to mind) then, maybe, you may choose to skip it.
- The certification “will help in getting something (job, position)” but is not formally required. This is where the “information asymmetry” shows up and you can reframe the question as “can I resolve the information asymmetry in another way?”. If you’re a professional hoping to break into a new field (regardless of this being your first job or just a career transition), a certification may help. If, on the other hand, you have a meaningful alternative – maybe recommendations, a portfolio, blog posts, professional reputation, … – then that certification may not be necessary.
- Those that think the certification is “necessary & sufficient” for a role, when in fact recruiters look at the cert as “just a signal”. Unfortunately, those candidates are often vulnerable to aggressive and potentially misleading advertising from those offering certifications or prep courses.
- Those unceremoniously dismissing the certification as “useless”. I think they often do it because they themselves have – consciously or not – enough experience/reputation to resolve the information asymmetry, but fail to see how someone breaking into the field might not be as fortunate.
- Is the cert used widely in industry as a gate process or generally respected in something you take part often? Might be a good cert to have.
- Does the cert provide a good roadmap of self-learning? Might be worth pursuing. Here I mention that while I never got my CCIE, I used the blueprints as a reference of topics to brush up on in network security.
- Finally, for “having the cert just in case”, it is helpful to think about it in terms of “how well does this certification resolve the underlying information asymmetry?” If you’re trying to signal broad understanding of an area, getting a specialized certification may not be as helpful. The reverse is true, of course: a generalized cert is useless if your signal is meant to be about a specific area. Also, keep in mind the value that industry/market places on the cert as a good signal mechanism. Things change over time…
HR does this because that certification has been, in their opinion, a useful heuristic to screen candidates. It may not be accurate from your perspective, but HR is making the rational decision that the cost of screening candidates via their certification signal is a good trade-off for the value they are getting. It’s not personal, it’s not stupid, it’s basic economics.
Whether this is a big issue for a candidate, depends on how much flexibility they have with the hiring process. If you’re being formally evaluated with a broad pool of possible candidates, you may have little choice but to go for it. If, on the other hand, you have both another way of resolving the asymmetry implied by requiring the cert AND the flexibility in the process (maybe you know the hiring manager and can bypass that requirement), go ahead and try that.
In this case, reframe it as “do the benefits of choosing to send this signal outweigh my own individual cost”? The cost may be clearly monetary or primarily the time needed.
Also, if you’re a more experienced professional, thinking of “can I resolve the information asymmetry in another way?” also helps. Maybe you lapse your professional certification, but you have a portfolio of blog posts, community participation, public code, … that are alternatives for showing what the certification was meant to show. It may be OK to let go of your introductory-level certification in a field where you can show expertise differently…
- the value of the having that particular cert as a valid signal may diminish.
- the screening effort will increase, both from the certification provider as well as potential employers. We see this happening with more stringent testing requirements, perhaps more obscure questions (in both testing and interviews), all of which raise the cost of the screening itself. Expect that cost increase to manifest itself in more expensive exam fees, or even more stressful hiring processes…
- Understanding certifications as both signal and screen mechanisms.
- Considering the “transaction costs” and “opportunity costs” of both obtaining the certification OR using it as a screening mechanism.
Hoping this contributes a bit as one considers which certs to embark on, or which certs to list in those job descriptions…
For my own career, I’ve let many certs lapse, not because they were good or bad, but because I evaluated that my personalized “signaling” cost (i.e. keeping the certification) was too expensive given the expected benefit. Others I plan to keep, since either the signaling cost is low enough, or they offer other benefits (tangible or not) that I value…
For the record, I cherish my CISSP designation. It means a lot to me, not so much for the technical knowledge itself (it was over 15 years ago…) or the inherent signal (many have it, and it has many supporters & detractors), but for reminding me of the never-ending quest to bring excellence to the InfoSec profession.
Finally, as a lifelong learner, I like to look at certifications as a rough guide to the common knowledge of a particular area. I may choose to just review the blueprint/requirements and guide my own studies along those lines. In some cases, I may go further and consider acquiring the certification as a personal goal or as a ‘sanity check’ that I do indeed have the minimum knowledge. After all, I’m always aware of the dangers of Dunning-Kruger effect, though not always able to avoid it..
As I read those posts, it’s uncanny how true to form I stayed over the past 5+ years, and how much of the same problems remain…
- I still use the same “modified GTD/InboxZero” approach. It has resisted the test of time pretty well.
- I still keep the same type of inboxes, but with more emphasis on Twitter now. Tweetbot is my primary interface, with the occasional glance into the official Twitter apps on iOS or web interface.
- My personal knowledge system (lots of mind-maps) keeps growing, though some maps show their age. If anything, they now serve as a jumping off point to newer information. Also, I’ve standardized on keeping those files “on the cloud” (currently Dropbox).
- I still use “Read it Later” (now renamed Pocket), and still struggle with how to extract information from it in a meaningful way. “reading list” is now several thousand articles long (yeah, good luck clearing that…)
- I still use Evernote, no longer as bookmark manager, but for writing and note taking. I have 3 notebooks:
- a local (not synced) work notebook for notes from customer meetings, etc…
- a shared notebook with my family for local notes. rarely used, though.
- a personal notebook where everything else goes. This blog post started as a note there.
- I still use Mind Manager, now on the Mac. Not as powerful as the Windows version, but good enough for me.
- MLO is a wonderful tool, but not available on Mac. I switched to Things, which is not perfect, but does a very good job. One thing I really got into is being able to use it on desktop and mobile platform (iOS). This was something I didn’t care much for back then, but have grown fond of.
- RIP “Google Reader”. Now I use Feedly, and links of interest (that didn’t show up on Twitter first), get sent to Pocket.
- Not related to the tools directly, but now I chose to support paid version of these tools (Evernote, Pocket, Feedly, …) whenever I can. It’s affordable and I feel good doing a little bit to keep these tools running…
- I come across LOTS of interesting content on Twitter – links to articles, specific images, … Lots of this interesting content gets saved into Pocket, but I only go back to them when using Pocket’s own search capability.
- It is unrealistic to expect I’ll read all on my Pocket list, or that I’ll ONLY save stuff on Pocket that I’ll surely read later.
- Often times I’ll struggle to find something I just *know* I came across before. This is worse for images (memes et al.)
- I get the nagging sense I should be able to leverage Evernote better, but not sure how.
I was extremely fortunate to be able to attend my first SIRAcon last week: it’s not often that one of those ‘aspirational’ conferences was happening at just the right time (found a way to fit in my schedule), not too far from home (Toronto to Detroit is not too far a drive), and was affordable (working on a tight budget here…).
It was a fantastic experience. Many, many thanks to the hosts (Quicken Loans), sponsors (CBI, RiskLens, BT, and BitSight), organizers (David Musselwhite and team), … The venue was great, and it was wonderful to see how the team is proud of Detroit and the turnaround that is happening.
My plan is to have a quick summary of the sessions and then, later, more general comments. There was a decent amount of live tweeting (spread between three hashtags: #SIRAcon2015, #SIRAcon15, and #SIRAcon) , but I thought a quick summary of each session would be a nice idea too.
Warning: my ‘starstruckness’ was out in full force. Totally justified 🙂
Doug and Richard opened up SIRAcon with a tour-de-force on applying quantitative methods to Risk analysis. They presented interesting findings showing that an appreciation of qualitative methods seems to be correlated with less comfort/familiarity with statistics concepts. To me, this presents a fantastic opportunity to pursue better dialogue through education 🙂
I loved the message that ‘we don’t have enough data’ is not an excuse. They presented a good case for using the beta distribution as a stepping stone from a world of ‘no data’ (where the uniform distribution applies) to a scenario where data is available.
Oh, bonus points for Latinizing the [in]famous bear analogy as ‘Exsupero Ursus‘ 🙂
Jay presented an interesting concept of Information Security as a ‘Wicked Problem’ and presented the Cynefin Framework as a basis for discussion on how complex the discussion around good/best/current/… practice applies to our problem space.
Later, Jay and Tom presented several interest exploratory data visualizations looking into how SSL/TLS practices correlate with botnet activity, as well as how indicators such as BItTorrent traffic appear related to Botnet activity and breaches.
I think it was a perfect example of how a data-driven approach to security can lead to insights we would not otherwise have.
J. Wolfgang Goerlich (@jwgoerlich) covered the topic of Culture and the relation to Risk, something he’s been deeply involved in. He collaborates with Kai Roer (@kairoer) on the excellent Security Culture Framework. There were several good examples of how changing user behaviour led to successful outcomes: security awareness training, SDLC, DLP, and physical security. More than that, though, he emphasized the importance of proper feedback loops when addressing culture changes, as well as what I thought was one of the most important messages: culture changes “one conversation at a time”.
Barton Yadlowski (@bmorphism) is an applied mathematician at HurricaneLabs, and presented an introduction and the case for leveraging machine learning in InfoSec, leveraging examples with Splunk, scikit-learn and Spark. He showed how tools such as Splunk can help with unstructured information and normalization, followed by exploratory data analysis. From there, he had an interesting introduction of broad Machine Learning topics and how it can be used to detect anomalies in different scenarios.
It’s always nice to start putting together the description of methods floating around with more practical applications.
Karl Schimmeck (@kschimmeck) covered an effort by SIFMA (Securities Industry and Financial Markets Association, an industry association of 300+ financial services firms) to simplify the process of performing 3rd-party risk assessments. This is extremely important to reduce to compliance costs for both financial services and vendors alike, and hopefully will be adopted by the regulators and the auditing organizations. Using SharedAssessments and SOC2 as initial guidelines, then mapping specific custom requirements and later mapping to NIST-CF, it looks very promising.
As someone who has been on the receiving end of those questionnaires, I really(!) look forward to this effort being successful.
Jack Whitsitt (@sintixerr) led us down a different path. Drawing on his broad experience and recent activities well beyond typical InfoSec, he urged us all to consider the much broader environment in which InfoSec exists. There’s fundamental issues at multiple levels of abstraction – from individual all the way to global – and, when it comes to organizations, how can we deal with (and support) InfoSec teams being thrown in the middle of geopolitical conflicts?
I loved the talk, but I would like us to explore better the assumption that things are getting worse: are we being affected by the availability bias of all the breaches? That’s an open question (to me, at least).
Thomas Lee from Vivo Security stayed consistent with the ‘quantitative’ theme for SIRAcon and looked at some interesting correlations on factors that may be related to breaches/compromise. He then made a strong case for adopting a more ‘actuarial’ approach to security programs, by taking a better look at loss data as a method of selecting security controls. He then presented an example of applying this methodology to a mid-sized pharmaceutical company, showing how a performing an endpoint update was actually a great approach of reducing impact from phishing.
Personally, I think the approach has merit, as long as we can avoid the trap of spurious correlations. I would have liked to have seen more confidence intervals there too 🙂
Michael Roytman (@mroytman) needs no introductions. His talk brought together concepts that have been around us for a while, coming from the likes of Schneier, Geer, Hutton, Ed Bellis, and others in a discussion of the interplay between Metrics, Data, and Automation. He clearly demonstrated how attackers are able to leverage automation in attacks much better than defenders are able to do so for defense. He also gave a great example of how better datasets can fundamentally change the whole ecosystem: Uber. By having better data about passenger demand (along with other things, of course), Uber has become the market-changing force we all know.
We all throw ideas around ‘what is a good metric’ and ‘how we can better automate’. This talk helped a lot.
Allison Miller (@selenakyle) closed off the first day with a topic that is very near and dear to me: drawing concepts from Economics into InfoSec and Risk. I’m a huge fan of her work, and this was no exception. Following a quick look into how microeconomics topics such as maximization of utility and utility curves work, she clearly demonstrated how, given an expected value (mean), a posture of risk aversion manifests itself as the desire for smaller expected variance. She then chose to explore possible linkages between InfoSec/Risk and macroeconomics topics, including a great tie-in to the Lucas critique. She has mentioned before the possible use of a ‘Security CPI‘ but now called out the possibility of defining ‘security econometrics’. Very thought-provoking indeed.
Day 2 post coming up soon…
NOTE: If this summary is at all interesting, know that SIRA recorded the event and that, if I understood it right, video will be made available to members (hint, hint, …) soon.
So, after what seems like a long time just trying to learn R on my own (I’m well into the Coursera / Johns Hopkins Data Science specialization), I finally came across a work-related problem I could try some very simple scripting on. Like a pre-schooler eager to show his work, here’s a blog post 🙂
Hoping others find it useful, or can point to a better way of doing this.
One of the products I work with analyzes HTTP[S] sessions, looking for malicious application-level behaviour (really cool stuff IMHO, but not the focus of this post). To do that, we need to be able to reconstruct the entire HTTP stream based on network traffic captured with a variety of methods (SPAN ports, network taps, monitoring switches, or in some cases cloned traffic from a load balancer). The “quality” of that captured stream is key – if we lose too many packets, we can’t reliably follow the TCP streams, which means we may miss user clicks on the site.
Note that this is different than analyzing Web logs (Apache, nginx, …) as these log files often only have partial information about a click, whereas a full traffic capture offers much, much more.
One of the ways we analyze that quality is by estimating how many sessions had “lost” packets during a monitoring window. This requires a little bit of tinkering as some lost packets may refer to sessions already in flight, so just counting lost packets as a proportion of total will be misleading. We need to count how many NEW sessions in our monitoring window have had lost packets.
Using whatever method you prefer, capture traffic in pcap format. This is often done with a server plugged in the capture destination, using tcpdump to write contents to a file.
Open up Wireshark on a separate PC and load the pcap file.
Then, conduct two separate analysis:
- First, we create a list of all TCP sessions of interest that started within the monitoring window. We do this by applying the following filter:
tcp.flags.syn==1 && tcp.flags.ack==0
then exporting the resulting list (select File->Export Packet Dissections…) as a CSV (call it “Sessions.csv”) without packet details.
The output looks something like this (sanitized):
"1195","0.134502000","10.8.15.216","192.168.123.82","TCP","62","50680 > 443 [SYN] Seq=0 Win=4380 Len=0 MSS=1460 SACK_PERM=1"
- Now, we create a list of all events where Wireshark detected missing TCP segments. This can be done with this filter:
Again, export it to a CSV (call it “Lost.csv”) as above without adding packet details.
This is what the [sanitized] output looks like:
"1031","0.114669000","10.1.205.60","192.168.123.134","TCP","66","[TCP ACKed unseen segment] 49856 > 443 [ACK] Seq=757 Ack=35816 Win=45192 Len=0 TSval=1428649841 TSecr=188814810"
At first, I used a ‘quick & dirty’ approach using Excel(!) to compare these files, but that is not repeatable. Let’s try a little R…
Now that we have the two CSV files, the R script below tells us exactly what we need to know – what percentage of NEW TCP sessions, started within the monitoring window, had at least one “lost segment”.
(Notice that the “webservers” variable is just something I used to filter out unwanted traffic from the pcap and has been sanitized in the example below.)
library (dplyr) library (stringr) lostfile <- "Lost.csv" sessionfile <- "Session.csv" webservers <- c("192.168.80") lost <- read.csv(lostfile,stringsAsFactors = FALSE) sessions <- read.csv(sessionfile, stringsAsFactors = FALSE) df_sessions <- sessions %>% filter(grepl(webservers, Destination)) %>% mutate(SrcPort=gsub(" >","",str_extract(Info,"(\\d+) >"))) %>% mutate(SrcSocket=paste(Source,":",SrcPort,sep="")) df_lost <- lost %>% filter(grepl(webservers, Destination)) %>% mutate(SrcPort=gsub(" >","",str_extract(Info,"(\\d+) >"))) %>% mutate(SrcSocket=paste(Source,":",SrcPort,sep="")) badsessions <- intersect(df_sessions$SrcSocket,df_lost$SrcSocket) df_badsessions <- df_sessions[df_sessions$SrcSocket %in% badsessions,] n_sess <- nrow(df_sessions) n_bad <- nrow(df_badsessions) print(paste("Total Sessions:",n_sess)) print(paste("Bad Sessions:",n_bad)) print(paste("Percentage:",round(n_bad*100/n_sess,digits = 2)))
Results and Conclusion
Running the script yields this:
Simple and to the point.
I’m sure there’s better ways to achieving the same goal: some tshark foo and scripting are obvious candidates, but in many cases we need to simplify the initial capture process as much as possible. Asking for a pcap for us to process is as easy as it gets.
So, I’m excited to finally have had the opportunity to use R for a ‘real-life’ scenario that I can share. Let me know how I can do better next time.